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Abstract 

This  thesis  serves  as  an  exploration  that  takes  the  sensors  within  a  cell  phone 
beyond  the  current  state  of  recognition  activities.  Current  state  of  the  art  sensor 
recognition  processes  tend  to  focus  on  recognizing  user  activity.  Utilizing  the  same 
sensors  available  for  user  activity  classification,  this  thesis  validates  the  ability  to 
gather  data  about  entities  separate  from  the  user  carrying  the  smart  phone.  Two 
Experiments  of  exploring  different  sensing  techniques  are  performed  to  determine  the 
ability  to  classify  entities  with  smart  phone  sensor  data.  The  first  experiment  focuses 
on  classifying  stationary  entities  affecting  the  environment  near  a  smart  phone.  The 
second  experiment  focuses  on  classifying  an  automotive  entity  moving  past  a  smart 
phone.  Using  statistical  and  wavelet  attributes  for  classifying  the  entities  in  the  two 
experiments,  respectively,  it  is  possible  to  accurately  classify  entities  based  off  the 
entities  environmental  influence.  With  the  ability  to  sense  entities,  the  ability  to 
recognize  and  classify  a  multitude  of  items,  situations,  and  phenomena  opens  a  new 
realm  of  possibilities  for  how  devices  perceive  and  react  to  their  environment. 


IV 


Table  of  Contents 


Page 

Abstract . iv 

List  of  Figures .  vii 

List  of  Tables . viii 

I.  Background  . 1 

II.  Literature  Review . 9 

2.1  SmartPhones . 9 

2.2  Sensor  Fusion . 10 

2.3  Sampling  Windows . 15 

2.4  Orientation  and  Position . 17 

2.5  Multimodal  Data  Fusion  and  Information  Presentation . 20 

2.6  Normalization  and  Classification . 22 

2.7  Perfecting  Gravity  Recognition . 27 

2.8  Multimodal  Success . 30 

2.9  Smart  Phone  Flocks . 32 

2.10  Natural  Event  Entity  Recognition  . 36 

2.11  Identihng  Clusters  of  Importance . 39 

2.12  Multimodal  Activity  Recognition . 44 

2.13  Attribute  Selection . 46 

2.14  Resource  Preservation . 54 

III.  Methodology  -  Recognition  and  Identification . 58 

3.1  Experiment  Objectives . 58 

3.2  Experiment  Methodology . 60 

3.3  Experiment  Boundaries  . 60 

3.4  Experiment  Response  Variables  . 61 

3.5  Experiment  Control  Variables . 63 

3.6  Experiment  Factors  Held  Constant  . 66 

3.7  Experiment  Data  Collection  . 67 

3.8  Methodology  -  Signature  Windows . 68 

IV.  Results  and  Analysis  -  Recognition  and  Identification  . 74 

4.1  Decision  Model  Review . 74 

4.2  Decision  Model  Results  . 74 

4.3  Training  and  Test  Set  Review . 84 

4.4  Graph  Analysis . 85 

4.5  Statistical  Analysis . 88 


v 


Page 

4.6  Statistical  Analysis  of  Attribute  Set  9 . 92 

V.  Results  and  Analysis  -  Scanning . 95 

5.1  Experiment  2 . 95 

5.2  Statistical  Attributes  . 95 

5.3  Wavelet  Decomposition  . 98 

VI.  Conclusions  . 101 

6.1  Entity  Recognition  . 101 

6.2  Implications . 101 

6.3  Further  Research . 103 

Bibliography  . 107 


vi 


List  of  Figures 


Figure  Page 

1.  Example  Magnetometer  Signatures . 69 

2.  Attribute  Set  4  Decision  Tree . 79 

3.  Attribute  Set  4  Confusion  Matrix . 80 

4.  Attribute  Sets  1-4  Confusion  Matrices . 82 

5.  Randomly  Selected  Control  Variable  Magnetometer  Data  . 86 

6.  Randomly  Selected  Control  Variable  Accelerometer  Data . 87 

7.  Randomly  Selected  Control  Variable  Gyroscope  Data . 88 

8.  Attribute  Sets  9  Decision  Tree . 93 

9.  Experiment  2  Decision  Trees . 96 

10.  Attribute  Set  Confusion  Matrix  For  Experiment  2 . 96 

11.  Example  Vehicle  Signatures . 98 

12.  Example  Vehicle  Decomposition . 100 


vii 


List  of  Tables 


Table  Page 

1.  Response  Variables  . 63 

2.  Control  Variables  -  Experiment  1 . 64 

3.  Control  Variables  -  Experiment  2 . 65 

4.  Data  Session  Charts  . 71 

5.  Attributes  for  the  Magnetometer,  Accelerometer,  and 

Gyroscope  data . 72 

6.  Discrete  Wavelet  Transform  Coefficients . 73 

7.  J4.8  Decision  Tree  Learner  Parameters  . 75 

8.  Attribute  Set  Table . 75 

9.  10-Fold  Cross-Validation  Attribute  Set  Results  . 77 

10.  Training  and  Test  Model  Attribute  Set  Results . 84 

11.  Magnetometer  Statistical  Values  for  Control  Variables . 89 

12.  Accelerometer  Statistical  Values  for  Control  Variables . 91 

13.  Gyroscope  Statistical  Values  for  Control  Variables . 92 

14.  Attribute  Set  9  T-Test  Results  . 94 

15.  Vehicle  10-Fold  Cross-Validation  Attribute  Set  Results . 97 

16.  Vehicle  10-Fold  Cross-Validation  Coefficient  Results . 98 


viii 


ENTITY  RECOGNITION  VIA  MULTIMODAL  SENSOR  FUSION  WITH 

SMART  PHONES 

I.  Background 

Over  the  past  two  decades,  cell  phones  have  exploded  into  a  nearly  ubiquitous 
presence  in  society.  The  penetrative  extent  of  cell  phone  use  is  felt  in  not  just  indus¬ 
trialized  nations,  but  also  developing  nations.  The  cell  phone  has  become  an  equalizer 
of  sorts,  helping  all  people  with  access  to  cell  based  communications  enjoy  a  sort  of 
homogeneity  of  communications  access.  This  has  allowed  farmers  and  craftsmen  in 
far  flung  corners  of  the  world  to  communicate  and  gain  access  to  the  global  commerce 
system,  not  to  mention  the  familial  and  communal  benefits  inherent  in  better  com¬ 
munication. 

The  introduction  of  the  smart  phone  ushered  in  yet  another  explosion  in  capability. 
No  longer  was  the  personal  computer  a  monolithic  item  that  was  expensive  in  not  just 
terms  of  currency,  but  resource  and  space  requirements  as  well.  Smart  phones  put  the 
power  of  a  computer  into  the  palm  of  many  more  hands  than  would  have  been  possi¬ 
ble  otherwise.  Coupled  with  the  communication  infrastructure  required  for  cell-based 
communications,  smart  phones  offered  numerous  additional  benefits.  These  include, 
but  are  not  limited  to,  enhanced  communications  through  social  applications  (apps), 
search  capabilities  to  more  easily  seek  out  global  connectivity,  access  to  medical  care 
advice,  notification  of  pending  disasters  (both  natural  and  manmade),  and  general 
information  accessibility  for  everything  from  sports  to  crop  planting. 
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The  continued  evolution  of  smart  phones  via  increased  computation  powers  and 
communications  bandwidth  will  ultimately  narrow  the  gap  further  between  smart 
phones  and  computers.  Additionally,  the  inclusion  of  multimodal  sensors  within 
smart  phones  gives  them  unique  abilities  unavailable  to  traditional  computing  plat¬ 
forms.  The  American  Heritage  College  dictionary  defines  modal  to  be  “of,  relating 
to,  or  characteristic  of  a  mode”,  as  such  a  multimodal  sensor  package  would  be  the 
combination  of  more  than  one  sensor  capable  of  sensing  different  characteristics  [30]. 
Multimodal  sensors  present  in  smart  phones  include,  but  are  not  limited  to,  accelerom¬ 
eters,  magnetometers,  gyroscopes,  microphones,  thermometers,  and  barometers.  The 
sensing  capability  present  in  an  average  smart  phone  is  far  beyond  the  sensing  capa¬ 
bility  present  in  an  average  personal  computer. 

The  inclusion  of  multimodal  sensors  within  a  smart  phone  allow  the  phone  to  be 
used  in  manners  not  possible  in  the  personal  computing  revolution.  In  addition  to 
offering  many  of  the  benefits  of  a  personal  computer,  the  smart  phone’s  ability  to 
sense  the  environment  allows  for:  the  tracking  of  activities,  the  detection  of  entities 
external  to  the  smart  phone,  and  environmental  surveillance.  The  sensing  ability 
presented  by  the  inclusion  of  multimodal  sensors  opens  the  door  to  a  wide  range  of 
possibilities,  with  future  additions  to  the  sensing  suite  offering  further  expansion  in 
what  can  be  sensed. 

The  ability  to  detect,  interrogate,  and  classify  phenomena  sensed  by  a  smart  phone 
is  wholly  dependent  on  having  versatile  and  intelligent  written  software  paired  with 
the  sensors  on  the  smart  phone.  Prior  to  the  development  of  smart  phone  based 
activity  recognition,  software  developers  and  computer  scientists  had  been  using  spe¬ 
cialized  sensing  devises  to  detect  user  activity  and  environmental  conditions.  With 
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the  addition  of  the  first  solid  state  accelerometer,  software  developers  and  computer 
scientists  quickly  set  to  work  on  methods  to  attempt  to  detect  user  and  environment 
conditions  with  smart  phones  [2,  4],  The  addition  of  gyroscopes  and  magnetometers 
led  those  developing  recognition  programs  down  an  ever  increasing  work  of  discovery. 
Quickly  the  science  evolved  from  merely  recognizing  the  current  position  of  a  smart 
phone  to  recognizing,  the  location  of  a  smart  phone  relative  to  a  person,  specific 
transportation  modes,  as  well  as  environmental  phenomena  [11,  12,  17]. 

An  area  yet  to  be  explored  in  detail  is  the  ability  to  look  beyond  the  smart  phone 
and  determine  whether  the  sensor  data  gathered  by  the  smart  phone  can  ascertain  the 
presence  of  devices,  environmental  phenomena,  and  other  entities.  With  the  ability  to 
utilize  sensor  data  to  determine  phone  orientation,  magnetic  heading,  activity  recog¬ 
nition,  and  more,  there  is  plenty  of  research  available  to  push  the  sensing  capability 
further.  Entity  recognition  via  smart  phone  sensors  utilizes  prior  research  based  on 
accurate  activity  recognition.  Attribute  selection,  classification  algorithm  generation, 
and  axis  synthesis  techniques  are  combined  to  process  sensor  data  and  utilize  the 
data  in  recognizing  entities  that  affect  the  smart  phone’s  environment  [18,  44,  35,  13]. 
A  smart  phone,  with  its  increasing  array  of  embedded  sensors  capable  of  sensing  a 
diverse  set  of  environmental  attributes,  is  capable  of  sensing  non-smart  phone  centric 
entities.  Smart  phones,  with  their  suite  of  sensors,  are  capable  of  recognizing  entities 
that  affect  the  environment  in  ways  detectable  by  the  smart  phone’s  sensors. 

In  the  detection  of  phenomena  via  a  multimodal  suite  of  smart  phone  sensors,  it 
is  apparent  that  the  current  literature  comes  short  when  describing  what  is  being 
detected.  Using  the  term  activity  to  describe  a  user’s  physical  motion  and/or  the 
transportation  mode  being  utilized  is  accurate  enough  [18,  44],  Using  activity  to  de- 
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scribe  the  activity  acting  on  an  environment  a  smart  phone  is  monitoring,  such  as  an 
earthquake,  becomes  less  understandable,  because  while  the  phone  is  experiencing  a 
shaking  activity,  the  entity  causing  the  shaking  is  the  earthquake.  In  this  research, 
the  term  entity  describes  environmental  actors  being  measured  by  the  smart  phone 
sensors.  The  American  Heritage  College  Dictionary  defines  entity  to  be  ’’the  ex¬ 
istence  of  something  considered  apart  from  its  properties,”  as  such  the  goal  of  this 
research  is  to  determine  whether  there  is  validity  to  the  use  of  a  multi-modal  approach 
to  entity  detection  [30]. 

Through  systemic  experimentation,  the  author  proves  there  is  validity  to  using  a 
smart  phone  sensor  suite  to  detect  and  recognize  entities  acting  on  the  environment 
in  manners  observable  by  the  smart  phone  sensors.  The  first  experiment  evaluates 
the  ability  of  a  smart  phone’s  sensors  to  detect  the  environmental  attributes  affected 
by  entities.  The  environmental  attributes  are  captured  via  their  respective  sensors 
and  then  processed  through  various  decision  trees,  proving  the  ability  to  accurately 
classify  between  different  but  similar  entities.  The  second  experiment  evaluates  the 
ability  of  a  smart  phone  to  be  used  as  an  environmental  scanner  to  detect  the  passing 
of  a  specific  entity.  With  the  smart  phone  in  a  stationary  position,  an  entity  that 
generates  a  magnetic  signature  is  passed  over  the  phone  and  the  ability  of  the  sensor 
to  capture  data  for  recognition  purposes  is  validated. 

As  hardware  engineers  construct  cell  phones  with  an  increasing  number  of  sensors, 
capable  of  sensing  unique  and  varied  environmental  attributes,  the  developer  and 
scientist  are  faced  with  the  challenge  of  combining  the  data  streams  from  multiple 
sensors  to  ascertain  an  environmental  attribute.  Specific  combinations  of  sensor  data 
streams  can  be  combined  to  detect  certain  user  or  environmental  attributes.  For 
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instance,  it  is  possible  to  determine  whether  a  user  is  sitting  or  climbing  stairs  via 
the  sensors  in  their  cell  phone  [1,  37,  25,  3,  34].  In  the  environment,  Faulkner  has 
shown  it  is  possible  to  detect  earthquakes  with  a  cell  phone  [12,  11],  Through  the 
experiments  detailed  within,  the  author  proves  it  is  possible  to  recognize  a  large  and 
diverse  set  of  entities  with  smart  phone  sensor  data. 

Traditionally,  user  activity  has  focused  on  the  actions  of  a  single  entity.  Research 
has  been  performed  to  identify  the  smart  phone  sensors  most  able  to  accurately  clas¬ 
sify  which  activity  a  user  is  performing.  Algorithms  have  been  developed  to  recognize 
whether  a  user  is  walking,  jogging,  climbing  or  descending  stairs,  biking,  sitting, 
standing,  taking  off  or  landing  in  an  airplane  [41,  35,  1,  6].  The  collection  of  user 
locations  via  signal  triangulation  and/or  GPS  location  analysis  allows  for  the  report¬ 
ing  of  traffic  conditions  [33].  By  utilizing  cell  phone  data  from  more  than  one  user, 
developers  and  scientists  have  shown  the  ability  to  use  the  location  data  to  determine 
the  level  of  congestion  of  road  and  highway  systems,  thus  producing  an  awareness  of 
a  system’s  status  without  requiring  live  video  feeds.  The  ability  to  collect  and  aggre¬ 
gate  the  data  from  cell  phones  opens  the  possibility  for  even  greater  environmental 
awareness. 

Scientists  have  delved  beyond  the  task  of  activity  classification  and  identification; 
using  the  community  of  smart  phones  present  in  the  population,  they  have  begun  to 
aggregate  data  from  multiple  sensors  to  identify  environmental  attributes.  The  work 
of  Faulkner  has  shown  that  it  is  possible  to  use  aggregate  data  from  a  community 
of  cell  phone  sensors  to  determine  whether  and  where  an  earthquake  has  occurred 
[12,  11],  This  aggregation  of  accelerometer  and  gyroscopic  data  shows  that  it  is  pos¬ 
sible  to  acquire  real  world  conditions  from  a  smart  phone’s  sensor  suite.  It  is  intuitive 
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to  believe  that  there  are  a  multitude  of  real  world  conditions  that  can  be  acquired, 
analyzed,  classified,  and  identified  with  varying  degrees  of  accuracy  by  aggregating 
the  sensor  data  from  a  smart  phone. 

The  terms  multimodal  and  sensor  fusion  have  both  been  used  to  describe  the  com¬ 
bination  of  multiple  sensor  output  data  streams  into  a  product  that  can  determine 
whether  a  condition  is  being  met.  Multimodal  sensing  is  the  utilization  of  more  than 
one  sensor  and  sensor  fusion  is  the  utilization  of  output  from  more  than  one  sensor. 
Typical  smart  phone  sensor  fusion  occurs  in  relation  to  the  various  algorithms  that 
are  used  to  determine  the  orientation  and/or  movement  of  a  device.  The  orienta¬ 
tion  and  movement  algorithms  utilize  the  output  of  a  cell  phone’s  magnetometer, 
accelerometer,  and  gyroscope  to  determine  how  a  device  is  oriented.  Additionally, 
they  utilize  the  sensor  outputs  to  determine  whether  the  device  is  experiencing  tilt 
or  sharp  directional  changes.  These  determinations  (gathered  from  the  outputs  of 
the  sensors)  are  then  used  as  inputs  in  various  software  applications  to  aid  in  game 
play  or  measurements  (i.e.  digital  levels  and  compasses).  As  no  single  signal  from 
either  the  magnetometer,  accelerometer,  or  gyroscope  is  able  to  definitively  identify 
whether  a  device  is  moving  or  oriented  in  a  specific  direction,  it  is  the  aggregation,  or 
fusion,  of  multiple  sensor  output  signals  that  are  input  into  an  algorithm  to  determine 
whether  a  condition  exists. 

The  aggregation  of  smart  phone  sensor  data  has  value  in  determining  the  envi¬ 
ronmental  status  affecting  a  smart  phone.  Through  the  careful  analysis  of  aggregate 
smart  phone  data,  it  may  be  possible  to  determine  a  myriad  of  conditions  present  in 
the  environment.  Events  beyond  traffic  congestion  analysis  are  possible  with  aggre¬ 
gate  smart  phone  sensor  data.  With  the  proper  classification  algorithms,  it  should 
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be  possible  to  determine  what  modes  of  transportation  a  user  is  utilizing;  whether  a 
triathlon  has  just  started  and  which  leg  the  participants  are  taking  part  in;  whether 
users  are  at  a  concert  or  evacuating  a  building  due  to  some  emergent  situation,  what 
appliance  or  machinery  is  present  in  a  cell  phone’s  immediate  environment;  and  pos¬ 
sibly  even  whether  a  large  geomagnetic  storm  is  taking  place  [18].  Using  the  processes 
developed  by  scientists  for  activity  recognition,  the  author  collects,  aggregates,  and 
processes  the  multimodal  smart  phone  sensor  data  to  accurately  classify  entities. 
This  accurate  classification  proves  that  the  basis  for  entity  recognition  is  possible 
with  modern  techniques. 

The  presence  of  accelerometers  that  can  measure  changes  in  the  force  of  gravity 
accurately  to  0.001,  gyroscopes  that  can  measure  changes  in  inertia  in  degree  per 
second  to  0.001  of  a  degree,  and  magnetometers  that  can  measure  changes  to  the 
magnetic  held  down  to  a  resolution  of  0.000001  tesla  presents  the  opportunity  for 
the  replacement  of  certain  legacy  sensors  with  a  smart  phone  deployed  to  monitor, 
analyze,  and  record  certain  events  [39,  40,  7]. 

While  the  near  ubiquity  of  smart  phones  and  the  suite  of  sensors  they  contain  make 
for  an  attractive  research  target,  the  ability  to  recognize  external,  non-transportation 
entities  is  an  area  of  study  still  very  much  in  the  early  stages.  Most  research  that  fits 
the  external,  non-transportation  entity  detection  and  recognition  has  been  limited  to 
the  large-scale  natural  events  such  as  earthquakes  [12,  11],  There  is  reason  to  believe 
the  algorithms  developed  for  the  analysis  of  a  smart  phone’s  environment  can  be  uti¬ 
lized  in  more  specialized  situations.  The  introduction  of  gyroscopic,  acceleration,  and 
magnetic  detection  sensors  into  the  vehicle  and/or  uniforms  of  military  personnel  and 
first  responders  could  prove  useful  to  the  detection  of  a  number  of  unique  conditions 
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whose  existence  would  alert  an  incident  or  combat  command  center  to  the  presence  of 
a  condition  that  requires  immediate  attention.  Thus,  the  aggregation  of  smart  phone 
sensor  data  could  have  implications  far  beyond  the  life  of  everyday  phone  users. 

The  ability  to  capture  recorded  data  from  an  entity,  whether  it  be  an  object  or  an 
event,  from  a  single  user  is  useful,  but  to  truly  get  a  grasp  on  the  magnitude  of  an 
entity,  it  would  be  necessary  to  get  the  data  from  as  many  sensors  (cell  phones)  as 
possible.  Thus  the  ability  to  detect  an  event  and  send  a  capture  request  to  all  smart 
phones  within  a  given  radius  may  be  useful  for  detecting  certain  environmental  ac¬ 
tors.  This  line  of  research  has  been  explored  in  regards  to  large  scale  phenomena  like 
earthquakes,  and  civilian  crowd  movements  classified  as  flocks  [12,  11,  24],  Combined 
with  the  ability  to  detect  additional  entities,  the  research  focused  on  detecting  large 
scale  events  offers  intriguing  possibilities. 

A  review  of  the  literature  regarding  the  evolution  of  smart  phone  sensor  utilization 
is  necessary  to  understand  how  sensor  utilization  has  changed.  An  understanding 
of  where  scientists  have  taken  the  art  of  sensing  since  before  smart  phones  to  where 
we  are  today  helps  reveal  the  nuanced  techniques  used  to  coax  the  most  accurate 
sensor  data  into  algorithms  for  device  orientation  and  user  activity  identification. 
The  more  complex  aspects  of  recognition  such  as  the  classification  algorithms  and 
attributes  utilized  are  varied,  however,  so  are  the  more  simplistic  aspects  such  as  the 
computations  used  to  stabilize  a  smart  phone’s  orientation. 


II.  Literature  Review 


2.1  Smart  Phones 

The  literature  surrounding  the  use  of  smart  phones  as  sensing  platforms  has  ex¬ 
ploded  over  the  past  decade  and  shows  no  sign  of  slowing  down.  In  order  to  get  a 
grasp  on  the  state  of  research  surrounding  multimodal  sensor  fusion  it  is  best  to  have 
an  idea  on  how  the  field  has  evolved.  Field  evolution  has  been  guided  (initially)  by 
the  presence  of  a  limited  number  of  sensors  in  the  cell  phone.  As  more  sensors  have 
been  added,  scientists  and  developers  have  produced  research  to  utilize  the  additional 
sensors  to  increase  the  ability  of  a  smart  phone  to  determine  user  activity  and  in¬ 
crease  environmental  awareness.  The  earliest  smart  phone  sensor  programs  relied  on 
using  the  WiFi  and  baseband  capabilities  of  the  cell  phone  [33]  [20].  Over  time,  cell 
phone  manufacturers  added  accelerometers,  magnetometers,  GPS,  gyroscopes,  and 
barometers  to  their  phones.  As  tends  to  happen,  researchers  have  taken  advantage 
of  the  additional  capabilities  offered  by  the  smart  phones  and  developed  ever  more 
complicated  packages  to  gauge  and  track  user  and  and  environmental  activity.  In  ad¬ 
dition  to  tapping  into  the  latent  sensing  abilities  offered  by  smart  phones,  researchers 
have  to  be  mindful  of  the  features  being  selected  to  obtain  information  regarding  a 
task.  Thus  algorithms  have  been  developed  that  seek  to  balance  the  tracking  ability 
offered  by  a  smart  phone  with  resource  preservation  and  user  interaction  (or  lack  of 
interaction  in  the  case  of  training  data).  This  review  of  research  literature  will  cap¬ 
ture  the  origins  of  sensor  fusion,  feature  selection  activities,  algorithm  development, 
and  resource  utilization.  Utilizing  the  knowledge  obtained  via  the  prior  research  aids 
in  taking  the  next  step  towards  entity  recognition. 
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2.2  Sensor  Fusion 


The  first  place  to  start  would  be  the  why  of  multimodal  sensor  fusion.  Before  de¬ 
termining  whether  there  is  value  to  fusing  the  sensor  data  that  is  output  by  multiple 
sensors,  one  must  acknowledge  that  value  exists  in  the  act  of  abstracting  data  from 
sensors  in  the  first  place.  While  the  intuition  is  almost  certainly  that  there  is  value  to 
processing  sensor  data,  the  scope  and  depth  of  mining  extends  far  beyond  what  most 
of  the  populace  considers  possible.  In  matters  of  scope,  the  sensor  can  be  viewed  first 
and  foremost  as  providing  a  ‘status’  on  the  ‘state’  of  a  smart  phone.  Between  the 
sensors  mentioned  above  and  additional  sensors  in  the  phone  such  as  proximity  and 
battery  temperature  sensors,  the  first  order  of  business  is  to  provide  a  status  to  the 
phone.  It  is  through  such  status  reporting  that  an  algorithm  in  the  phone  determines 
when  a  phone  has  been  lifted  to  an  ear  to  make  a  phone  call,  thus  turning  the  screen 
off,  or  that  a  phone  left  in  the  sun  is  getting  dangerously  hot,  thus  shutting  the  phone 
off.  The  use  of  accelerometers  has  been  used  in  devices  with  a  hard  disk  drive  to  park 
the  drive  heads  when  certain  gravity  thresholds  are  violated,  thus  protecting  the  data 
on  the  drive. 

Moving  beyond  contributing  to  the  phone’s  basic  operations,  accelerometers  were 
introduced  as  a  means  to  detect  orientation.  With  display  screens  that  can  rotate 
beyond  a  landscape  and  portrait  display,  the  ability  to  recognize  how  a  phone  was 
oriented  proved  useful.  This  sensor  has  been  seized  upon  by  game  makers,  as  well  as 
those  interested  in  detecting  activity,  as  a  means  to  detect  what  a  user  is  doing  with 
their  phone.  And  while  the  accelerometer  is  useful  for  detecting  between  a  portrait 
or  landscape  orientation  of  the  phone,  the  inclusion  of  powerful  3-axis  accelerometers 
combined  with  3-axis  magnetometers  allowed  for  fine  grain  measurements.  By  com¬ 
bining  the  data  from  an  accelerometer  and  magnetometer,  the  cell  phone  could  now 
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determine  not  just  orientation,  but  via  the  devices  magnetic  orientation  algorithms 
can  determine  the  pitch,  yaw,  and  roll  of  a  device.  This  allows  the  device  to  be  used 
for  recording,  recognizing,  and  responding  to  very  specific  movement  scenarios.  The 
inclusion  of  a  gyroscope  and  barometer  have  allowed  for  even  finer  grain  activity  detec¬ 
tion,  allowing  for  the  accurate  detection  of  turns  and  altitude  change,  respectively  [2], 

Moving  beyond  the  device  and  to  the  user,  there  are  numerous  studies  and  pro¬ 
grams  that  have  been  developed  that  determine  the  activities  of  a  user.  Research  has 
been  done  to  determine  whether  a  user  is  stationary  or  moving,  whether  a  user  is 
standing  or  sitting,  whether  a  pedestrian  is  walking  or  biking,  whether  a  pedestrian 
is  walking  or  running,  whether  a  user  is  moving  via  pedestrian  means  or  motorized 
transport,  whether  a  user  is  in  a  bus  or  a  subway,  and  so  on.  The  idea  of  determining 
user  activity  has  merits  from  the  concept  of  activity  classification  for  logging  purposes 
to  the  analysis  of  travel  patterns  for  transportation  system  development  and  tuning. 
By  adding  GPS  chips  to  the  cell  phone,  the  user  is  not  just  provided  awareness  of 
their  longitude  and  latitude,  but  applications  granted  access  to  location  data  and  an¬ 
alyze  travel  information  for  traffic  congestion  reporting,  emergency  service  location 
reporting,  and  even  opening  the  possibility  for  disease  tracking  and  reporting  [31]. 
The  scope  of  usefulness  to  cell  phone  sensor  data  has  moved  far  beyond  the  earliest 
iterations  where  they  were  useful  to  not  much  more  than  the  hardware  and  software 
of  a  single  cell  phone  user. 

The  inclusion  of  baseband  and  WiFi  chipsets  in  a  smart  phone  is  ubiquitous  in 
so  much  that  for  a  phone  to  be  a  smart  phone,  it  will  include  a  chipset  to  access 
a  communications  network  as  well  as  the  ability  to  connect  to  public  and  private 
local  wireless  networks.  Poolsawat,  Pattara-Atikom,  and  Ngamwongwattana  discuss 
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fusing  base  transceiver  station  (BTS)  information  with  GPS  location  data  for  provid¬ 
ing  status  on  traffic  [33].  In  attempting  to  implement  an  alternative  to  the  system 
of  surveillance  cameras  and  sensors  that  local  transportation  departments  install  to 
monitor  traffic  conditions,  Poolsawat  et  al.  focused  on  cost,  ease  of  deployment,  and 
systemic  robustness.  The  costs  to  building  and  operating  an  effective  traffic  surveil¬ 
lance  system  are  high.  The  role  of  a  traffic  information  system  is  to  monitor  traffic 
conditions,  process  the  conditions,  and  broadcast  solutions  to  certain  conditions.  This 
set  of  traffic  monitoring  tasks  is  ideally  suited  to  a  hybrid  system  that  combines  some 
non-cell  based  sensors  and  data  acquired  from  user’s  cell  phones.  Poolsawat  et  al. 
detail  a  system  that  captures  data  from  the  endpoint  (a  cell  phone  user);  the  endpoint 
interacts  with  the  cellular  provider’s  BTS  and  it  is  this  interaction  that  proves  useful 
to  traffic  monitoring.  Using  software  to  abstract  data  features  from  the  cell  phone’s 
BTS  interaction,  Poolsawat  et  al.  build  a  system  that  indicates  the  mobile  country 
code,  the  mobile  network  code,  the  location  area  code,  and  the  cell  ID  (CID)  of  the 
BTS  a  cell  phone  is  currently  associated.  Using  the  data  features  abstracted  from  the 
BTS  information,  the  authors  calculate  a  cell  dwell  time  (CDT)  whereby  the  dura¬ 
tion  of  time  a  endpoint  cell  phone  spends  within  a  particular  cell  ID  is  identified  and 
sent  to  a  collection  server  for  analysis.  Using  a  history  of  CDT  in  each  CID,  analysis 
can  be  performed  to  determine  whether  a  user  is  in  a  congested  traffic  zone.  Adding 
GPS  coordinates  to  the  traffic  system  data  would  enable  for  precise  identification  of 
congestion  points,  however,  due  to  GPS  receivers  requiring  line  of  site  connectivity  to 
the  GPS  satellites,  there  is  no  guarantee.  Whereas,  if  a  cell  phone  can  communicate 
with  a  BTS,  it  will  always  have  a  CID  to  reference.  In  addition  to  preserving  privacy 
by  not  sharing  user  details,  the  system  can  be  configured  to  preserve  system  resources 
such  as  bandwidth  and  battery  life;  in  order  to  preserve  resources,  the  system  would 
not  maintain  a  constant  state  of  connectivity  between  end  user  and  server,  the  server 
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would  receive  a  feature  report  once  every  3  minutes  or  so. 


Using  accelerometer  data  from  a  user’s  cell  phone  can  produce  a  trove  of  feature 
data  by  which  to  evaluate  numerous  activities  as  well  as  potential  environmental 
attributes.  However,  the  data  output  by  the  accelerometer  would  be  useless  if  it 
cannot  be  oriented  relative  to  gravity.  As  an  accelerometer  measures  the  strength  of 
gravity,  it  is  possible  to  determine  the  orientation  of  an  accelerometer  (and  thus  the 
device  housing  the  accelerometer)  relative  to  gravity.  In  the  article,  Using  Gravity 
to  Estimate  Accelerometer  Orientation,  David  Mizell  articulated  a  methodology  to 
determine  device  orientation  with  a  three-axis  accelerometer  [32],  By  using  an  esti¬ 
mate  achieved  by  averaging  the  accelerometer  samples,  the  gravity  constant  can  be 
determined.  Letting  v  represent  the  average  of  acceleration  for  a  given  time  interval 
window,  we  have: 

V  =  (VX,Vy,VZ) 

Let  a  represent  a  point  of  time  within  the  window,  we  have: 

Using  the  average  v  and  the  instantaneous  a  it  is  possible  to  calculate  both  the  static 
and  dynamic  acceleration  experienced  by  the  accelerometer.  The  static  acceleration 
corresponds  to  the  effect  of  gravity  and  the  dynamic  acceleration  corresponds  to  the 
effect  a  user’s  activity  has  on  the  accelerometer.  Letting  d  represent  the  dynamic 
component  of  a  we  finch 


d  (flx  VX)Cly  'Vyi&zi^z) 
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The  orientation  of  the  device  can  then  be  found  by  using  the  vector  dot  products. 
This  is  done  by  computing  the  projection  p  of  d  upon  the  vertical  axis  v  as: 

d  ■  v 

V  =  — 
v  ■  v 

Whereby  p  is  the  vertical  component  of  the  dynamic  acceleration  vector  d.  From  the 
vector  p  we  can  compute  the  horizontal  component  of  the  dynamic  acceleration: 

h  =  d  —  p 

Through  the  above  equations  it  is  possible  to  decompose  the  accelerometer  readings 
to  obtain  the  gravity  manifested  upon  the  accelerometer,  thus  allowing  the  orienta¬ 
tion  of  the  device  to  be  calculated.  While  the  intent  of  Mitzel’s  work  was  to  prove 
that  device  orientation  could  be  determined  by  transforming  accelerometer  data,  thus 
focusing  primarily  on  the  vertical  gravitational  component,  later  work  proved  that 
horizontal  movement  could  reliably  be  determined  once  device  orientation  had  been 
calculated  [19]. 

Though  much  of  the  prior  research  utilizes  the  multimodal  sensing,  it  is  most  of¬ 
ten  performed  in  a  complimentary  manner  to  enhance  the  measurements  obtained 
with  one  sensor  with  added  data  from  another  [2],  In  other  instances  a  multimodal 
approach  is  used  in  a  supplementary  manner  in  case  one  sensor  fails  to  perform  as 
expected  [33] .  The  goal  of  this  research  is  to  determine  whether  an  entity  can  be  accu¬ 
rately  classified  using  both  a  complimentary  and  supplementary  method  by  building 
classifiers  based  off  all  available  data. 
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2.3  Sampling  Windows 


As  noted  in  [32],  in  order  to  achieve  an  average  of  acceleration  for  v,  it  is  necessary 
to  designate  a  window  length.  Across  the  literature  there  are  examples  of  various 
window  lengths,  with  [32]  indicating  a  length  of  a  few  seconds  to  others  indicating 
windows  of  up  to  eight  seconds.  In  research  performed  in  Activity  Recognition  on 
an  Accelerometer  Embedded  Mobile  Phone  with  Varying  Positions  and  Orientations, 
the  window  length  was  found  to  be  between  the  four  and  five  second  time  frames 
[41].  The  goal  of  the  research  was  to  refine  the  science  of  activity  recognition  with  an 
accelerometer,  regardless  of  the  position  and  orientation  a  user  has  their  cell  phone. 
Earlier  work  cited  by  Sun,  Zhang,  Li,  Guo,  and  Li  had  devised  methods  to  extract 
features  that  could  be  used  to  identify  various  types  of  pedestrian  activity,  but  were 
limited  in  that  they  required  the  user  to  mount  the  sensor  to  their  body  in  a  specific 
location  orientation.  Through  various  algorithm  refinements,  the  Sun  et  al.  present  a 
method  to  free  users  from  such  stringent  orientation  requirements  for  accurate  activity 
detection.  In  developing  their  orientation  insensitive  technique,  Sun  et  al.  propose 
using  the  magnitude  of  the  accelerometer  readings  to  compensate  for  changes  in  device 
orientation.  As  such,  Sun  et  al.  generate  an  additional  feature  from  the  accelerometer 
output: 

(A,  ||A||)  (Ox;  n  ||®xj  O'yi  ®z||) 

With  the  orientation  insensitive  feature,  Sun  et  al.  found  that  using  an  overlapping 
window  divided  into  frames,  which  was  able  to  accurately  recognize  activity  93.1%  of 
the  time  when  the  window  length  was  set  to  4  seconds.  Using  a  orientation  sensitive 
methodology  increased  the  window  to  5  seconds;  this  demonstrates  that  depending 
on  the  computational  methods  used,  window  length  will  vary.  In  both  cases,  the 
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windows  were  divided  into  frames  of  1  second  duration  for  feature  extraction. 

The  windows  necessary  for  activity  and  entity  recognition  are  different,  as  accurate 
classification  of  an  activity  can  be  thought  of  as  recognizing  a  pattern  that  takes  place 
over  a  relatively  long  duration  of  time  compared  to  the  recognition  of  an  entity  that 
may  be  affecting  the  environment  around  a  smart  phone  for  a  brief  duration.  Thus  a 
snapshot  of  an  activity  pattern  will  contain  the  information  necessary  to  determine 
the  activity  being  performed,  whereas  a  snapshot  of  an  entity  could  be  during  any 
number  of  potential  patterns  depending  on  the  effects  being  generated  by  the  entity. 
Optimally,  a  window  would  capture  an  entire  cycle  of  a  mode  of  operation  for  an 
entity,  eliminating  the  need  to  classify  multiple  operation  phases.  This  research  will 
show  that  utilization  of  entity  signature  snapshots  results  in  accurate  classification  of 
the  entities  used  as  control  variables. 

In  addition  to  identifying  optimal  window  length,  the  Sun  et  al.  sought  to  identify 
means  to  achieving  accurate  activity  recognition  while  preserving  cell  phone  resources 
[41],  After  determining  which  sensors  will  be  used  in  an  activity  recognition  task,  the 
next  task  is  to  determine  sampling  rates  and  feature  selection.  Sampling  rates  will 
vary  greatly  from  activity  to  activity.  When  sampling  for  human  activity  recognition, 
relatively  lower  sampling  rate  of  20  -  60Hz  have  proven  sufficient.  When  sampling 
non-human  activity  recognition,  higher  rates  of  sampling  may  be  required,  that  is  one 
of  the  tasks  of  this  research.  In  either  case,  a  higher  sampling  rate  may  not  be  resource 
conservative,  but  it  will  provide  data  for  analysis.  In  regards  to  feature  selection,  Sun 
et  al.  generate  the  following  for  each  frame:  the  mean,  variance,  Frequency-Domain 
entropy,  FFT  energy,  and  the  correlation.  When  selecting  features,  Sun  et  al.  tended 
to  select  less  computationally  complex  features  to  save  resources.  For  recognition 
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purpose,  each  activity  being  recognized  will  have  features  for  the  mean,  variance, 
Frequency-Domain  entropy,  and  FFT  energy.  As  such,  for  a  system  capable  of  rec¬ 
ognizing  standing,  walking,  biking,  and  running,  there  would  be  16  features  to  train 
against  plus  6  more  for  correlation  between  the  4  activities.  Using  an  extracted  fea¬ 
ture  vector,  Sun  et  al.  normalize  each  extracted  feature  vector  before  training. 


2.4  Orientation  and  Position 

Normalization,  or  transformation,  of  a  signal  is  important  when  discussing  gener¬ 
ation  of  features  from  a  sensor.  As  noted  prior,  compensating  for  the  orientation  of 
a  users  cell  phone  is  important.  One  method  involves  calculating  the  static  gravita¬ 
tional  component  to  determine  which  axis  is  vertical  [32]  and  an  alternate  method 
involves  taking  the  magnitude  of  the  each  accelerometer  component  to  compensate 
for  device  orientation  [41].  A  third  technique  utilizes  the  concept  on  weightlessness; 
an  accelerometer  will  experience  weightlessness  when  carried  on  a  person  that  is  run¬ 
ning  or  jumping,  thus  revealing  the  vertical  axis  of  the  accelerometer  [19].  In  either 
case,  when  generating  features  from  sensor  data  it  is  necessary  to  transform  the  data. 
Sensor  data  or  signals  are  transformed  into  a  common  coordinate  system  in  order 
to  improve  activity  recognition.  As  noted  in  Accurate  Activity  Recognition  Using 
a  Mobile  Phone  Regardless  of  Device  Orientation  of  Location,  device  orientation  is 
not  the  only  concern  [19].  The  location  of  the  cell  phone  is  also  vital  to  activity 
recognition  as  user  movement  will  look  significantly  different  to  cell  phone  sensors 
depending  on  where  the  device  is  held;  a  sensor  at  the  waist  will  experience  different 
force  signatures  then  a  device  strapped  to  the  upper  arm.  In  order  to  compensate 
for  location  variation,  Henpraserttae,  Thiemjarus,  and  Marukatat  devise  a  method  of 
feature  training  that  involves  different  feature  signatures  for  each  body  position.  As 
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an  example,  the  feature  set  for  recognizing  whether  a  person  is  running  with  a  cell 
phone  on  their  arm  will  exhibit  different  characteristics  from  a  person  is  running  with 
a  cell  phone  in  a  pocket.  Using  robust  features  sets  that  are  independent  to  device 
placement  is  done  by  creating  a  model  specific  to  each  likely  area  of  device  placement. 

Henpraserttae  et  al.  [19]  explore  methods  to  calculate  the  forward  axis.  Adding  to 
the  work  performed  by  Mizell  [32],  they  utilize  the  mean  of  the  dynamic  acceleration 
experienced  by  the  accelerometer.  They  assign  the  dynamic  portion  of  the  vertical 
axis  to  w  and  use  it  to  find  the  forward  axis.  Under  the  assumption  that  most  activity 
is  in  the  forward-backward  direction,  the  forward  direction  can  be  computed  from  the 
principal  axis  of  data  on  the  plane  that  is  perpendicular  to  w: 


x't  =  xt  —  (xjw)w 


where  x'  is  the  removed  acceleration  signal  along  the  vertical  axis  and  x  is  the  raw 
accelerometer  signal.  Next,  an  eigen-decomposition  is  performed  on  the  covariance 
matrix  of  the  projected  data: 

C  =  f  )T 

t=  1 

where  /i'  is  the  mean  of  the  projected  data,  calculated  by: 

t=  1 

The  forward  axis  is  parallel  to  the  main  eigenvector  of  the  covariance  matrix  C.  With 
u  corresponding  to  the  eigenvector  that  has  the  largest  eigenvalue,  u  will  be  used  as 
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the  forward  axis  in  Henpraserttae’s  et  al.  global  coordinate  system. 

if  x'l'u  <  0  then  u  =  —  u 

Knowing  u  to  be  the  horizontal  axis  and  w  to  be  the  vertical  axis,  the  last  axis  to 
find  is  the  sideward  axis: 

V  =  U  X  /I 

Knowing  the  three  axes,  one  can  construct  the  transformation  matrix  as: 

Uy  VjZ 
Vy  VZ 
Wy  Wz 

Having  established  the  matrix,  Henpraserttae  et  al.  use  the  dynamic  mean  to  esti¬ 
mate  the  rotational  angles  for  when  the  device  is  placed  in  different  orientations.  The 
rotational  matrix  is  used  to  transform  the  input  signal  into  the  same  reference  coor¬ 
dinate  system  regardless  of  orientation  or  placement.  Thus  for  activity  recognition 
purposes,  the  first  task  is  to  classify  the  probable  location  then  to  classify  the  activity 
taking  place  by  comparing  the  normalized  values  to  training  datasets.  Henpraserttae 
et  al.  found  significant  differences  between  classification  without  and  with  transfor¬ 
mation,  with  transformative  accuracy  performing  better  by  42  —  51%  in  training  sets 
with  a  minimal  number  of  orientation  classifications.  When  more  orientation  clas¬ 
sifications  are  trained  on,  the  accuracy  for  activity  recognition  is  5.8%  higher  than 
without  classification.  In  all  cases,  a  normalized  classification  system  outperformed 
using  non-transformed  data. 


T  = 


'U'x 

Vx 

w7 
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2.5  Multimodal  Data  Fusion  and  Information  Presentation 

Moving  beyond  orientation  and  position,  researchers  continue  to  identify  the  most 
useful  sensor  data  streams  to  fuse  when  capturing  data  for  later  analysis  playback. 
Previously  discussed  research  has  utilized  BTS  and  GPS  data  to  identify  a  cell  phone’s 
location.  A  good  deal  of  activity  recognition  software  utilizes  GPS  data  in  addition 
to  the  accelerometer,  though  for  pedestrian  activity  recognition  purposes  the  GPS 
data  is  most  often  treated  as  a  perk  rather  than  a  means  to  detecting  and  classi¬ 
fying  a  user’s  activity.  Research  performed  by  Microsoft  fused  the  use  of  camera, 
accelerometer,  gyroscope,  magnetometer  and  GPS  data  [4],  Researchers  designed  the 
Greenfield  program  as  a  demonstration  application  to  help  smart  phone  users  locate 
their  cars,  though  the  breadcrumb  left  by  the  phone  could  be  used  to  locate  any 
number  of  entities.  Through  the  use  of  accelerometer  features,  the  program  counted 
the  user  steps.  Gyroscope  features  helped  identify  turns  through  changes  in  inertia. 
Magnetometer  features  identified  compass  bearing,  though  external  interference  from 
building  structures  and  items  in  pockets  and  purses  limit  the  usefulness  of  the  mag¬ 
netometer  for  determining  true  compass  bearing.  GPS  location  data  was  available 
in  non-parking  garage  scenarios,  but  once  a  vehicle  was  parked  in  a  covered  location, 
GPS  data  became  unreliable.  The  camera  was  used  to  capture  the  exact  state  and 
location  a  vehicle  was  parked  in.  Together,  the  researchers  used  this  data  to  create 
a  breadcrumb  trail  where  users  could  walk  back  to  their  vehicle  with  bearing,  step 
counts,  and  turn  instructions.  Besides  providing  an  integrated  use  of  fused  data, 
specifically  the  breadcrumb  trail  generated  by  accelerometer,  gyroscope,  and  magne¬ 
tometer  input,  the  researchers  also  studied  the  cognitive  effects  the  data  presentation 
would  require  of  users.  As  a  data  presentation  application,  users  found  Greenfield 
presented  information  that  may  have  been  highly  accurate  but  was  mentally  taxing 
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to  process. 


Continuing  to  develop  the  concept  of  activity  recognition  with  cell  phone  sensors  is 
important  for  the  purposes  of  physical  activity  monitoring,  personal  impact  and/or 
exposure  monitoring,  and  transportation  and  mobility-based  recruitment.  In  an  ef¬ 
fort  to  distinguish  between  pedestrian  mobility  and  vehicular  mobility,  the  Reddy  et 
al.  of  Using  Mobile  Phones  to  Determine  Transportation  Modes  developed  fine  grain 
activity  recognizers  that  worked  independent  of  external  knowledge  [35].  Previous 
full-featured  activity  recognizers  used  external  indexes  to  identify  likely  transporta¬ 
tion  hubs;  Reddy  et  al.  rely  more  heavily  on  a  combination  of  GPS  and  accelerom¬ 
eter  data  to  identify  mass  transit.  As  GPS  is  found  to  perform  satisfactorily  when 
attempting  course  grained  transportation  mode  classification,  and  then  only  when 
signals  are  present,  trying  to  classify  systems  with  similar  speed  and  acceleration  pro¬ 
files  requires  finer  grained  signature  classification.  The  accelerometer  in  the  iPhone 
5,  for  instance,  is  able  to  measure  gravity  with  an  accuracy  of  4  milli-gravity  [39].  Us¬ 
ing  the  accelerometer  data,  Reddy  et  al.  produce  accurate  acceleration  and  breaking 
signatures  for  transportation  modes  that  present  similar  speed  profiles.  Through  the 
fine-grained  accelerometer  data,  classification  between  buses,  trains,  and  subways  is 
more  accurately  determined.  In  addition,  through  techniques  developed  both  previ¬ 
ously  and  new  introductions  in  their  research,  Reddy  et  al.  produce  a  more  robust 
solution  that  is  device,  location,  and  orientation  agnostic. 

Investigation  into  sensing  techniques  beyond  GPS  and  accelerometers  was  researched 
in  [35].  Reddy  et  al.  researched  using  wireless  infrastructure  recognition  to  obtain 
accurate  transportation  classification  results.  The  research  indicated  that  the  wire¬ 
less  technology  such  as  bluetooth  is  not  pervasive  enough,  and  WiFi  and  BTS  are 
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too  dependent  on  a  dense  distribution  and  are  not  suitable  for  fine-grain  details.  A 
combination  of  GPS  and  accelerometer  data  was  found  to  produce  the  most  accurate 
classification  while  preserving  system  resources.  GPS  data  features  proved  useful  for 
determining  activity  due  to  speed  distribution  (range).  Accelerometer  data  features 
proved  useful  for  determining  variance  of  motion  changes.  Some  examples  where  a 
combination  of  the  two  sensor  data  features  prove  useful  for  discrimination  are  when 
differentiating  between  walking  and  running  and  biking.  Walking  and  running  may 
exhibit  similar  speed  characteristics  based  off  of  GPS  data,  but  the  variance  in  ac¬ 
celerometer  output  will  be  larger  when  running;  the  same  traits  are  exhibited  when 
comparing  running  and  biking,  with  similar  speed  characteristics  being  possible  and 
running  having  more  accelerometer  variance  than  biking.  With  all  three  activities  able 
to  take  place  at  the  same  location,  referencing  an  external  database  of  transporta¬ 
tion  modes  would  not  offer  much  fidelity  in  accurate  activity  recognition.  However, 
by  using  GPS  to  determine  speed,  and  using  the  data  features  extracted  from  the 
accelerometer  output,  accurate  recognition  of  an  activity  is  increased.  In  regards  to 
resource  preservation,  as  the  use  of  the  GPS  sensor  is  more  resource  intensive  then 
the  use  of  the  accelerometer,  the  GPS  can  be  left  off  when  the  accelerometer  is  not 
detecting  any  motion. 


2.6  Normalization  and  Classification 

Reddy  et  al.  [35]  found  that  their  techniques  allows  for  a  data  window  size  of 
one  second,  allowing  for  a  75%  reduction  from  the  results  presented  in  [41],  The  au¬ 
thors  found  that  a  one  second  window  allowed  for  near  instantaneous  classification  of 
transportation  mode.  Smaller  window  size  resulted  in  inaccurate  activity  recognition, 
larger  window  size  results  in  unnecessary  delay  in  activity  recognition.  Accelerometer 
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data  was  normalized  by  taking  the  magnitude  of  the  readings: 


A- mag  ~  \J (Az)2  +  (Ay)2  +  ( Az )2 

which  allows  for  the  assumption  of  random  and  possibly  changing  device  orienta¬ 
tion.  Features  from  the  accelerometer  data  are  the  mean,  variance,  energy  Discrete 
Fournier  Transform  (DFT)  coefficients.  From  the  GPS  data,  the  feature  utilized  was 
speed  with  the  algorithm  weeding  out  invalid  points.  Activity  recognition  and  clas¬ 
sification  was  done  with  correlation  based  feature  selection  (CFS).  CFS  was  chosen 
because  it  allowed  for  a  feature  subset  selector  that  eliminates  irrelevant  and  redun¬ 
dant  attributes.  Examples  of  the  utilization  of  the  various  features  are:  GPS  feature 
used  to  differentiate  between  still  and  motorized  transport,  accelerometer  variance 
used  to  determine  whether  an  individual  is  running,  and  accelerometer  DFT  data 
used  to  differentiate  between  different  on-foot  transportation  modes. 

In  addition  to  selecting  the  features  most  relevant  to  activity  recognition  from 
the  sensor  data  set,  Reddy  et  al.  explored  which  classification  system  selected  the 
correct  activity  [35].  The  instance  classifiers  considered  by  Reddy  et  al.  were  the 
C4.5  Decision  Trees  (DT),  K-Means  Clustering  (KMC),  Naives  Bayes  (NB),  Nearest 
Neighbor  (NN),  and  Support  Vector  Machines  (SVM).  Additionally,  a  continuous 
Hidden  Markov  Model  (CHMM)  and  a  two-stage  system  involving  the  most  accurate 
instance  based  classifier  (the  C4.5  DT)  combined  with  a  discrete  Hidden  Markov 
Model  (DHMM).  The  classification  structure  functioned  as  follows: 

Data  — »  Noise  Filtering  — >  Feature  Calculation  — >  DT  Instance  Based  Classifier  — > 

DHMM  Classifier 
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which  would  select  the  transportation  mode  classification.  With  the  above  classifi¬ 
cation  structure  in  place,  the  research  looked  at  the  accuracy  as  it  related  to  device 
placement.  When  the  device  was  carried  in  the  hand  or  mounted  to  the  upper  arm, 
the  accuracy  was  the  highest;  waist,  pocket,  bag,  and  chest  placement  resulted  in 
lower  accuracy  ratings,  though  the  accuracy  between  the  lowest  and  highest  rates 
were  between  94.3%  and  95.0%.  Some  of  this  lack  of  precision  can  be  made  up  for 
with  user  specific  training.  With  user  specific  training,  the  accuracy  increased  2.2%  as 
compared  to  the  generalized  classifier.  Overall,  this  study  produced  highly  accurate 
activity  classification  across  both  pedestrian  and  motorized  methods  with  utilizing 
energy  aware  detection  to  minimize  resource  strain  without  requiring  user  specific 
training  or  external  indexes.  Lastly  it  showed  that  accurate  prediction  could  be 
achieved  through  location  and  orientation  agnostic  processes. 

Expanding  on  which  features  have  value  when  used  to  discriminate  between  activ¬ 
ity  types,  Anjum  and  Ilyas  [1]  seek  through  techniques  similar  to  Reddy  et  al.  [35]  to 
determine  the  most  accurate  data  features  to  extract.  The  research  performed  in  this 
paper  was  limited  to  classifying  pedestrian  means  of  transportation  (plus  driving); 
recognizable  activities  were  walking,  running,  ascending  stairs,  descending  stairs,  cy¬ 
cling,  driving,  and  being  inactive.  As  in  [35],  a  number  of  instance  classifiers  were 
examined  with  the  C4.5  DT  proving  the  most  reliable.  As  a  multi-modal  experiment, 
data  streams  from  the  accelerometer  (3-axis),  gyroscope  (3-axis),  and  the  GPS  (lati¬ 
tude,  longitude,  and  altitude)  were  acquired.  Anjum  and  Ilyas  researched  the  optimal 
sample  rate  for  acquiring  data  to  classify  and  found  that  8Hz  proved  adequate  for 
human  activity;  sampling  rates  from  5Hz  to  100Hz  were  investigated  with  8Hz  prov¬ 
ing  the  optimal  rate.  As  noted  in  previous  studies,  the  varying  orientation  of  a  phone 
does  not  allow  for  a  meaningful  comparison  of  measurements  of  a  particular  axis’  data 
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with  the  measurements  from  the  same  axis  in  a  different  activity  trace.  Anjum  and 
Ilyas  chose  to  use  a  Eigen-decomposition  of  the  covariance  matrix  for  the  3  accelerom¬ 
eter  axis  in  order  to  rotate  the  three  orthogonal  reference  axes  d\,  d2,  and  d3.  The 
three  orthogonal  axes  are  organized  to  the  axes  descending  order  of  signal  variation. 
In  preprocessing,  the  sample  covariance  of  any  two  axis  i  and  j  is  computed  via: 

1  N 

aij  =  X  1  “  ai)(aAn\  -  dj) 

n—  1 


where  N  denotes  the  number  of  samples  and  a  represents  the  mean.  From  this  a 
covariance  matrix  is  generated: 
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The  transformation  matrix  D  is  then  a  product  of:  D  =  AV  where  D  =  [n],  d2 [n] ,  d3[n]), 

A  =  (ai[n],  02  [n],  03  [n]),  and  V  is  the  matrix  of  eigenvectors. 


Once  the  transformation  matrix  is  completed  in  the  preprocessing  step,  the  au¬ 
thors  then  extract  the  following  features  from  a  5  second  window:  mean,  standard 
deviation,  FFT  spectral  energy,  frequency  domain  entropy,  and  the  log  of  FFT  [1], 
Additionally,  the  autocorrelation  function  of  all  accelerometer  signals  was  computed. 
The  autocorrelation  function  is  computed  by: 

r+1  _  1  —  di)(di[n  +  t]  —  dj) 

Ti[  l  ~  ^ 
n=  1  1 
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The  mean,  dtl  of  the  orthogonal  references  axises  was  found  to  be  of  little  use.  How¬ 
ever,  the  mean  of  f*  proved  more  useful.  The  variance  for  both  o ^  and  a2t  were 
computed.  Most  of  the  activities  recognized  by  this  research  are  periodic,  thus  there 
is  a  need  to  identify  the  period.  Period  identification  is  a  three  step  process  that 
involves  finding  the  samples  that  are  local  maxima,  compute  the  time  difference 
between  successive  maxima,  and  estimate  the  period  of  the  signal  as  the  median 
inter-maxima  delay.  The  inverse  of  the  median  inter-maxima  delay  is  the  frequency. 
When  attempting  to  find  a  linear  equation  for  the  correlation  coefficient  function,  the 
following  equations  were  used: 


R2  = 


1  - 


Se 

St 


where 


N  N 

St  =  y^(r[w]  -  r)2  and  Se  =  J^(r[ra]  =  f[n ])2 

n= 1  n=l 


Having  tested  the  activity  recognition  algorithms  with  the  above  model  and  equa¬ 
tions,  Anjum  and  Ilyas  found  the  autocorrelation  functions  provided  more  accurate 
recognition  results  than  transformed  signals,  which  is  computationally  beneficial  as 
the  autocorrelation  functions  are  cheaper  than  transformation.  The  features  found 
to  be  most  useful  were  the  mean  as  noted  above,  variance  as  noted  above,  standard 
deviation,  R  squared,  and  the  period.  The  majority  of  this  research  review  focused  on 
the  features  extracted  from  the  accelerometer  data  because  Anjum  and  Ilyas  found 
the  gyroscope  to  be  of  no  value  in  their  recognition  algorithms.  As  a  note,  Anjum 
and  Ilyas  found  that  ascending  stairs  was  the  most  difficult  activity  to  recognize  accu¬ 
rately,  the  inclusion  of  a  barometer  in  more  cell  phone  models  should  thus  become  a 
sensor  of  value  when  attempting  to  differentiate  between  ascending,  descending,  and 
non-inclined  walking. 
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2.7  Perfecting  Gravity  Recognition 


In  another  activity  recognition  paper,  Wang,  Chen,  and  Ma  [44]  compared  between 
using  acceleration  synthesization  as  done  by  Reddy  et  al.  in  [35]: 

Amag  —  \J  (Ax)2  +  ( Ay )2  +  ( Az )2 

and  acceleration  decomposition  as  advocated  by  [32]  : 

d  ■  v 

V  =  — 
v  ■  v 

Wang  et  ah  focus  on  accelerometer  data  for  activity  recognition  is  based  on  the 
previously  stated  premise  that  the  accelerometer  is  signal  independent  (unlike  GPS), 
has  low  energy  consumption,  has  instant  startup,  and  as  such  is  a  wholly  contained 
sensor  with  no  external  requirements.  Wang  et  al.  extracted  the  following  features: 
mean,  standard  deviation,  mean  crossing  rate,  third  quartile,  sum  and  standard  devi¬ 
ation  of  frequency  components  between  0Hz  and  2Hz,  ratio  of  frequency  components 
between  0Hz  and  2Hz  to  all  frequency  components,  sum  and  standard  deviation  of 
frequency  components  between  2Hz  and  4Hz,  ratio  of  frequency  components  between 
2Hz  and  4Hz  to  all  frequency  components,  and  spectrum  peak  position  for  a  total  of 
11  features.  These  11  features  were  used  for  the  synthesized  accelerometer  data.  For 
the  decomposed  data,  the  11  features  were  applied  to  both  the  vertical  and  horizontal 
axises  with  an  additional  feature  added  for  the  correlation  coefficient  between  the  two 
series,  leading  to  a  total  of  23  features.  After  the  experiments  and  analysis  were  per¬ 
formed,  Wang  et  al.  found  that  the  SD  features  produced  more  accurate  results  than 
the  decomposed.  Wang  et  al.  surmised  that  if  the  window  length  is  not  long  enough 
or  the  estimate  for  gravity  is  not  accurate,  the  decomposition  technique  will  yield 
features  not  as  viable  for  accurate  activity  recognition  as  the  synthesized  method. 
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Using  a  DT,  Wang  et  al.  found  that  using  the  decomposition  technique  yielded  an 
accuracy  of  60.71%  and  the  synthesized  technique  yielded  and  accuracy  of  61.42%. 
Pairing  down  the  feature  set  through  the  Waikato  Environment  for  Knowledge  Anal¬ 
ysis  (WEKA)  machine  learning  library  of  algorithms  changed  the  decomposition  and 
synthesized  results  to  60.43%  and  70.73%  respectively. 

Activity  recognition  via  attached  sensors  as  a  science  has  undergone  continual  re¬ 
finement,  with  the  placement  of  powerful  and  versatile  sensors  in  cell  phones  the  pace 
of  refinement  is  rapid.  In  an  extension  of  the  previous  work  [44],  the  Hcmminki, 
Nurmi,  and  Tarkoma  work  to  improve  the  gravity  component  found  to  effect  the  ac¬ 
curacy  of  accelerometer  decomposition  [18].  Noted  Hcmminki’s  et  al.  discussion  of 
previous  work  is  that  accelerometer  synthesization  is  accurate  for  pedestrian  activity 
detection,  the  technique  is  less  accurate  for  detecting  motorized  activity.  According 
to  the  research,  only  accurate  decomposition  offers  the  fine-grain  features  necessary 
to  observe  the  acceleration  and  deceleration  patterns  of  various  motorized  transporta¬ 
tion  mechanisms.  As  such,  the  Hcmminki  et  al.  worked  to  improve  the  computation 
of  the  gravity  component.  The  goal  is  similar  to  earlier  work  [19]  where  the  horizon¬ 
tal  component  was  computed.  Knowing  accurate  vertical  and  horizontal  axises  allows 
for  more  accurate  identification  of  acceleration  and  deceleration  periods;  introduced 
are  the  concept  of  peak  features  to  characterize  acceleration  and  deceleration  pattens 
associated  with  different  motorized  modalities. 

As  synthesization  works  well  with  accelerometer  data,  synthesization  should  work 
equally  as  well  for  magnetometer  and  gyroscopic  data.  By  using  synthesization  in  the 
research  into  entity  recognition,  the  need  to  decompose  the  various  sensor  streams 
is  eliminated,  resulting  in  classifiable  data  with  less  data  processing.  In  a  situation 


where  an  entity  effects  the  magnetic  held  detectable  by  the  smart  phone  sensors,  if 
one  axis  on  the  sensor  detects  held,  it  is  likely  the  other  two  axises  would  detect  some 
magnetic  change  as  well.  By  synthesizing  the  data  each  of  the  axises  are  combined 
into  a  single  output  thus  the  need  to  assign  attributes  to  each  axis  is  negated. 

As  accelerometers  are  the  principle  sensor  utilized  when  discussing  physical  activ¬ 
ity  recognition,  there  has  not  been  much  mention  of  fusing  other  sensor  data  into 
the  algorithms  on  more  than  a  minimal  basis,  when  doing  so  added  to  the  fine-grain 
classification  efforts.  The  work  of  Barthold,  Subbu,  and  Dantu  in  Evaluation  of 
Gyroscope-embedded  Mobile  Phones  explores  the  exploitation  of  gyroscope  data  to 
determine  device  orientation  [2],  Barthold  et  al.  believe  that  the  accelerations  expe¬ 
rienced  by  the  phone  limit  the  usefulness  of  accelerometer  data  in  determining  device 
orientation.  While  much  of  the  previously  discussed  work  has  been  about  how  to 
make  activity  recognition  orientation  and  placement  agnostic,  this  work  is  oriented 
more  towards  understanding  the  precise  orientation  of  the  device.  Once  a  determina¬ 
tion  has  been  made  on  the  precise  device  orientation,  the  inertia  experienced  by  the 
gyroscope  can  then  be  used  to  infer  direction  changes.  Typically  the  accelerometer 
and  magnetometer  sensors  are  used  as  multi-modal  sensors  to  determine  device  orien¬ 
tation.  Using  the  gyroscope  to  infer  direction  changes  can  be  useful  in  environments 
such  as  indoor  and/or  urban  environments,  environments  that  will  compromise  the 
ability  of  the  magnetometer  to  determine  device  orientation.  The  major  complication 
when  using  gyroscope  data  is  that  gyroscopes  tend  to  exhibit  drift  areas  over  time, 
as  such  the  drift  errors  result  in  a  decrease  (or  increase)  in  a  final  result  in  a  given 
time  window,  thus  a  process  needs  to  be  put  in  place  to  account  for  the  drift.  If 
the  drift  error  can  be  overcome  and  neutralized,  the  benefits  to  adding  gyroscope 
data  to  device  orientation  determination  is  that  the  gyroscope  is  immune  to  external 
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accelerations  and  magnetic  interference,  thus  algorithms  will  be  able  to  determine 
orientation  even  in  magnetically  interfered  areas  while  the  phone  is  accelerating. 

The  use  of  gyroscopes  to  determine  smart  phone  orientation  is  an  example  of  mul¬ 
timodal  orientation  detection,  as  Barthold  et  ah  proved  that  both  the  accelerometer 
and  gyroscope  are  capable  of  determining  a  smart  phone’s  orientation  with  varying 
degrees  of  accuracy.  Additionally,  in  perfecting  the  gyroscope  orientation  technique 
the  magnetometer  was  used  to  obtain  pertinent  magnetic  readings,  demonstrating 
the  relevance  to  multimodal  sensor  fusion  in  smart  phone  when  it  comes  to  detecting 
device  orientation.  The  multimodal  sensor  fusion  used  in  entity  recognition  goes  past 
smart  phone  orientation  to  entities  complete  external  from  the  smart  phone. 

2.8  Multimodal  Success 

A  more  recent  addition  to  the  concept  of  sensor  fusion  is  the  CMOS  sensor  based 
camera  present  in  cell  phones.  In  the  paper,  Using  CMOS  Sensors  for  Gamma  De¬ 
tection  and  Classification,  Cogliati,  Derr,  and  Wharton  explore  using  a  standard  cell 
phone  to  detect  gamma  radiation  [6].  The  CMOS  facilitates  the  detection  of  ion¬ 
ized  electrons;  when  ionized  electrons  are  emitted  by  a  gamma  emitting  object  and 
make  contact  with  a  cell  phone’s  CMOS  sensor,  the  sensor  is  capable  of  registering 
this  particle  strike.  The  detection  is  based  on  the  principles  of  scattering  and  ioniz¬ 
ing  radiation,  as  well  as  the  different  energy  levels  associated  with  various  types  of 
radiation.  Using  CMOS  sensors  to  detect  electrons  has  a  large  noise  correction  re¬ 
quirement,  as  a  CMOS  sensor  will  detect  electrons  due  to  leaky  circuits  in  the  phone. 
Heat  will  increase  the  amount  of  electrons  emitted  from  leaky  circuits.  The  detection 
of  leaky  circuits  is  a  fairly  static  feature  that  will  be  visible  in  subsequent  images,  as 
such  filtering  will  remove  similar  detections.  Gamma  rays  on  the  other  hand  continu- 
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ally  produce  a  stream  of  ionized  electrons,  due  to  the  nature  of  scattering  the  CMOS 
detections  will  strike  different  locations  on  the  sensor  and  will  not  emulate  leaky  cir¬ 
cuits.  In  addition  to  compensating  for  thermal  noise,  Cogliati  et  al.  found  the  need 
to  compensate  for  defective  pixels.  As  a  CMOS  captures  images  in  three  colors,  red, 
blue,  and  green,  it  is  rare  that  at  a  given  pixel  location  all  three  receptors  are  bad.  As 
such,  the  preprocessing  algorithms  verifies  each  component  of  each  pixel  individually 
to  determine  whether  it  is  functioning.  Once  leaks  and  bad  pixels  have  been  identified, 
a  number  of  noise  removal  techniques  were  assessed  to  account  for  signals  that  didn’t 
correspond  to  identified  leaks  and  defective  pixel  components  but  still  may  be  erro¬ 
neous.  Cogliati  et  al.  used  median  value  noise  reduction,  statistical  methods  using 
the  standard  deviation  and  mean  (background  =  max{y  G  x\\y  <  x  +  2ax}),  kurtosis, 
and  the  high-delta  method.  The  high-delta  method  takes  the  max  value  and  second 
highest  value  seen  in  a  set  of  images  and  finds  the  difference  between  the  two  values, 
thus  reducing  both  thermal  and  defective  pixel  noise.  Cogliati  et  al.  found  that  using 
the  cell  phones  CMOS  sensor  and  phone  based  data  processing  of  images,  the  cell 
phone  is  capable  of  functioning  as  a  low-sensitivity  dose  rate  meter  with  limited  spec¬ 
trum  information.  While  not  particularly  active  compared  to  dedicated  meters,  the 
ubiquity  of  cell  phones  makes  it  useful  when  other  tools  are  not  available.  As  a  sensor 
fusion  example,  an  application  (GammaPix)  have  been  designed  where  the  GPS,  ac¬ 
celerometer,  and  CMOS  data  streams  have  been  co-utilized  to  locate  airports  (GPS), 
detect  takeoff  and  landing  (via  accelerometer  data),  and  monitor  high-atmosphere 
radiation  exposure  (via  CMOS).  Entity  recognition  seeks  to  expand  on  the  concepts 
utilized  in  the  gamma  radiation  detection  methods  of  GammaPix  by  sensing  not  just 
environmental  phenomena  but  also  the  entities  producing  the  phenomena. 
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2.9  Smart  Phone  Flocks 


An  example  of  monitoring  the  movements  of  groups  of  people  via  sensor  fusion  can 
be  found  in  Detecting  Pedestrian  Flocks  by  Fusion  of  Multi-Modal  Sensors  in  Mobile 
Phones  [24],  While  much  of  the  prior  discussion  has  been  focused  on  the  concept  of 
recognizing  activities  performed  by  individuals,  this  work  focuses  on  the  joint  iden¬ 
tification  of  the  indoor  movement  of  multiple  people  forming  a  flock.  A  flock  can  be 
thought  of  as  a  group  of  persons  moving  in  the  same  direction  for  some  duration,  or 
more  formerly  as  the  existence  of  a  moving  cluster  with  regards  to  the  ground  truth 
location  data.  It  can  be  thought  of  algebraically  as  a  pedestrian  flock  F  is  a  moving 
cluster  that  exists  for  the  duration  t  >  r  and  consists  of  more  than  n  >  u  people 
where  r  and  v  are  application  specific.  Kjrgaard,  Wirz,  Roggen,  and  Troster  found 
that  combining  sensors  in  a  multi-modal  fashion  improved  the  accuracy  over  unimodal 
approaches.  The  multi-modal  approach  allows  for  robustness  when  a  single  category 
of  sensor  may  fail;  detection  accuracy  improves  in  the  multi-modal  approach  and  en¬ 
ergy  savings  may  be  achieved  through  specific  combinations  of  sensors  for  detecting 
flocks  in  specific  environments.  One  scenario  where  the  detection  of  a  pedestrian  flock 
is  desirable  is  to  aid  emergency  personnel  during  evacuation  processes.  In  addition  to 
aiding  emergency  personnel,  it  would  be  beneficial  to  target  the  flock  with  location 
and  movement  appropriate  messaging. 

Identifying  a  pedestrian  flock  is  performed  via  a  cluster-based  weighted  majority 
voting  system.  A  weighted  majority  voting  is  performed  that  outputs  a  set  of  clusters 
that  the  majority  of  features  agree  on.  As  the  flocks  are  clusters  of  people  where  a 
majority  stay  together  over  time,  temporal  clustering  is  performed  to  combine  highly 
similar  clusters  that  exist  for  several  successive  time  windows  into  flocks.  This  pro¬ 
cess  allows  Kjrgaard  et  al.  to  output  devices  grouped  into  flocks,  and  thereby  people 
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as  well  [24],  Features  will  be  generated  from  the  data  output  by  the  accelerometer, 
magnetometer,  and  GPS  sensors,  as  well  as  the  WiFi  radio.  Accelerometer  features 
are  used  to  correlate  movement  and  acceleration  variance  similarity  between  potential 
flock  members.  Magnetometer  features  are  used  to  correlate  turn  and  relative  heading 
changes,  the  similarity  of  the  changes  are  compared  between  potential  flock  members. 
GPS  data  is  used  to  examine  proximity,  speed,  and  heading  differences  from  location¬ 
fingerprinting  when  available.  The  WiFi  radio  signal  feature  is  observed  to  determine 
similarity  in  signal  strength.  Performing  pair-wise  correlation  on  the  above  features 
will  result  in  a  n  X  n  matrix  for  each  feature.  Then,  weighting  the  features  abstracted 
from  the  four  sensors  helps  to  identify  cell  phones  that  belong  to  the  same  pedestrian 
flock.  This  is  performed  with  both  spatial  and  temporal  clustering. 

Using  the  accelerometer  data  to  detect  pedestrian  flocks,  Kjrgaard  et  ah  utilize 
Overlap  in  Movement  Behavior  (OMB)  and  Windowed  Cross-Correlation  of  Accel¬ 
eration  (WCCA)  algorithms  [24],  OMB  is  used  correlate  cell  phones  that  exhibit 
similar  activity;  activity  recognition  is  performed  at  a  rather  course-grained  level 
in  their  research,  identifying  stationary  vs  moving  activities.  After  computing  two 
lists  of  moving  and  stationary  entities,  Ma  and  M5,  respectively,  the  following  OMB 
similarity  feature  computation  is  performed  over  a  specified  time  window: 

&a,b  ~  - 

n 

Using  the  WCCA  method,  Kjrgaard  et  al.  are  able  to  analyze  acceleration  signals 
to  determine  whether  two  signals  are  the  result  of  similar  movement  behavior.  The 
WCCA  method  considers  variance  of  the  signal  magnitude  to  mask  variations  in 
device  orientation  and  small  differences  in  movement  trajectories.  This  method  allows 
flexibility  in  cross-correlation  as  members  of  a  flock  are  not  constrained  to  walk  in  step, 
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thus  the  similarity  of  a  pair  of  devices  will  be  computed  from  measurement  streams 
of  acceleration  magnitude, ;two  lists  of  acceleration  magnitudes  are  computed,  one  for 
each  device  being  compared.  Since  behavior  changes  can  be  shifted  in  time  between 
flock  members,  the  maximum  cross  correlation  is  computed  with  a  lag  between  minus 
one  and  plus  one  second.  As  such  the  WCCA  is  computed  by: 

Sa,b  =  max(corr(ya,  Vb ,  £),£  e  [-1, 1]) 


Magnetometer  data  features  are  used  to  determine  whether  individuals  walk  to¬ 
gether  as  measured  by  the  phone’s  magnetic  orientation;  Kjrgaard  et  al.  use  Win¬ 
dowed  Cross-Correlation  in  Relative  Heading  (WCCH)  changes  and  Time  Since  Last 
Turn  (TSLT)  algorithms  [24],  Similar  to  the  WCCA  used  for  accelerometer  features, 
each  device’s  heading  is  computed  and  compared  to  another.  As  such  two  lists  of 
heading  deviations,  Ha  and  Hb  are  cross  correlated  with: 

Sa,b  =  max(corr(Ha ,  Hb,  £),£  e  [-1, 1]) 

The  TSLT  algorithm  first  detects  turn  then  computes  the  duration  of  time  between 
turns  to  determine  whether  similarity  exists.  Turns  are  computed  by  comparing  the 
mean  compass  orientation  measurements  for  devices: 

y  =  mean(wi(Ca ))  —  mean(w2(Ca )) 
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against  the  standard  deviation  of  the  compass  orientation  measurements  for  the  de¬ 


vices: 

StdDev(wi(Ca))  +  StdDev(w2(Ca)) 

V  >  - o - +  G 


where  G  is  a  guard  factor.  From  this  information,  Kjrgaard  et  al.  compute  a  list, 
K,  for  each  device  which  has  a  zero  for  when  a  turn  is  detected  and  else  the  previous 
value: 


to 

>a,b= 

t=to~T 


Sa,  = 


Kat  -  In 


WiFi  features  are  analyzed  to  determine  spatial  features  where  WiFi  positions  are 
applied  to  a  predefined  map  of  signal  strength  measurements  and  to  determine  signal 
strength  features  where  flocks  are  detected.  The  spatial  features  model  the  similarity 
between  two  mobile  devices  as  the  shortest  walking  distance  between  their  position 
via  the  predefined  map;  devices  that  have  larger  walking  distances  will  be  less  likely  to 
be  clustered.  Signal  features  are  computed  for  devices  based  off  their  signal  strength 
vectors  and  compared  to  other  devices  to  derive  their  Euclidean  distance.  Addi¬ 
tional  signal  features  are  SpatialSpeed  and  SpatialHeading  that  are  computed  as  the 
minimum  sum  of  differences  in  speed  and  heading  within  a  window  of  time.  WiFi 
features  help  to  identify  an  individual’s  location  with  location-fingerprinting  and  sig¬ 
nal  strength  (when  available),  and  the  SpatialSpeed  and  SpatialHeading  are  able  to 
be  cross-correlated  to  determine  device  movement  similarity,  thus  the  WiFi  features 
aid  in  providing  hner-grain  detection  of  pedestrian  flocks. 

Having  chosen  the  feature  correlation  algorithms,  Kjrgaard  et  al.  explore  different 
clustering  techniques  to  determine  whether  pedestrian  clusters  exist  [24],  Using  the 
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geometric  features  found  in  the  WiFi  spatial  and  signal  features,  Kjrgaard  et  al.  use 
a  hierarchical  clustering  algorithm.  Using  the  non-geometric  features  (e.g.,  the  rest 
of  the  features),  Kjrgaard  et  al.  use  a  density  clustering  algorithm.  The  clustering  is 
run  against  the  similarity  matrixes  described  previously  for  each  feature.  Once  the 
clustering  has  been  performed,  the  clusters  are  fused  to  improve  the  overall  quality, 
this  is  done  by  weighted  majority  voting  to  combine  clusters  identified  in  the  different 
feature  spaces.  The  weighting  is  based  on  the  quality  of  the  selected  features.  In  order 
for  devices  to  become  members  of  the  same  flock,  the  devices  must  exhibit  feature 
sets  and  quality,  and  they  are  required  to  have  membership  in  successive  time  stamps 
to  join  a  flock.  Flock  recognition  was  most  accurate  when  using  OMB,  TSLT,  spatial, 
and  signal  features,  thus  a  fusing  of  accelerometer,  magnetometer,  and  WiFi  produced 
the  most  accurate  results.  When  wireless  access  points  (WAPs)  were  not  available,  or 
the  location-fingerprinting  was  not  achievable,  OMB  and  TSLT  performed  best.  GPS 
did  not  prove  worthwhile  to  fuse  due  to  the  indoor  nature  of  the  research  performed. 


2.10  Natural  Event  Entity  Recognition 

Using  the  sensors  within  a  cell  phone  for  detections  beyond  the  human  activity 
realm  is  an  area  of  research  ripe  for  study.  Dr.  Faulkner  et  al.  have  developed  a 
process  to  utilize  cell  phone  sensors  to  monitor  for  external  environmental  events, 
namely  earthquakes  [12]  [11],  Faulkner  et  al.  research  in  [12]  lays  the  foundation 
for  detecting  events  that  are  difficult  to  model  and  characterize  a  priori  with  het¬ 
erogenous,  community-operated  sensors.  As  envisioned,  each  sensor  detects  unusual 
observations  and  will  notify  a  fusion  center  of  such  observations.  Determining  what 
an  unusual  observation  threshold  is  typically  relies  on  conditional  probabilities  such 
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as: 

P \Xs,t\Et  =  1]  > 
nxs,t\Et  =  o]  - r 

However,  an  event  such  as  an  earthquake,  clue  to  its’  rarity,  does  not  have  suffi¬ 
cient  data  to  obtain  good  probability  models.  In  addition,  since  the  composition  and 
placements  of  sensors  is  heterogenous,  each  will  record  varying  environmental  factors. 
Lastly,  while  it  is  plausible  that  much  of  the  higher  math  could  be  performed  at  the 
fusion  center  to  determine  whether  an  event  has  taken  place,  bandwidth  limitations 
and  resource  availability  necessitate  developing  a  more  reliable  method  for  event  de¬ 
tection  at  the  cell  phone  level.  Faulkner  et  al.  developed  a  pick  method  whereby  the 
transmission  of  false-positives  to  a  fusion  center  could  be  mitigated.  Using  a  likelihood 
specific  to  each  sensor,  a  variation  of  the  above  probability  threshold  will  determine 
whether  a  signal  is  sufficiently  different  from  normal  data  so  that  the  probability  of 
an  event  taken  place  is  significant: 

P[x|  Et  =  1]  F[x'\Et  =  1] 

F[x\Et  =  0]  >  F[x'\Et  =  0] 

thus  the  less  probable  x  is  under  normal  data,  the  larger  the  likelihood  ratio  gets  in 
favor  of  the  anomaly.  In  order  to  get  this  equation  to  work  as  desired,  Faulkner  et 
al.  have  to  establish  the  parameters  for  a  sensor  to  estimate  the  distribution  in  an 
online  manner,  establish  a  sensor  specific  threshold  for  anomaly  recognition,  and  then 
develop  the  true  positive,  false  positive,  and  appropriate  anomaly  threshold  rate  for 
the  fusion  center. 

In  order  to  establish  an  online  density  estimation  for  each  sensor,  Faulkner  et 
al.  develop  a  methodology  to  estimate  the  distribution  of  normal  observations  over 
time  Lq(Xsj)  =  F[XS:t\Et  =  0]  for  non-events.  This  is  done  by  using  a  parametric 
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approach: 


F[X9>t\Et0]=<i>{X8>t,d) 

This  model  improves  when  the  time  span  of  sensing  increases  and  thus  the  availability 
of  training  data  increases.  The  fusion  center  can  send  back  updated  6  to  each  device 
in  order  to  improve  their  detection  algorithms.  In  order  to  set  the  online  threshold 
estimation  for  a  specific  sensor  so  that  the  per-sensor  false  positive  rate  can  be  con¬ 
trolled,  an  appropriate  ts  must  be  chosen.  Using  the  e-approximation  to  limit  the 
search  space 

iv 

then  assuming  that  rs  is  obtained  through  a  percentile  estimation  for  pa,  rs  can  be 
found  by 

Po  =  P [L0(xatt  <  Ts\ 

The  two  above  probability  functions  complete  the  variables  necessary  for  earthquake 
detection  on  a  cell  phone.  Without  getting  into  the  algorithmic  process  present  at 
the  fusion  center,  the  method  by  which  the  network  identifies  earthquakes  will  be 
discussed  [12].  After  each  sensor  has  learned  the  decision  rules  that  allow  for  the 
control  of  system-level  false  positive  rates,  each  sensor  decides  on  its’  own  whether  it 
believes  an  event  has  taken  place.  When  a  sensor  believes  an  event  has  taken  place, 
it  sends  a  pick  message  to  the  sensor  fusion  center.  The  fusion  center  will  then  decide 
whether  an  event  has  occurred  by  comparing  the  number  of  sensors  reporting  1  for 
an  event  versus  0  for  a  non-event. 

The  task  of  detecting  the  earthquake  is  left  to  the  accelerometer;  accelerometer  and 
location  data  are  fused  to  report  on  acceleration  values  where  location  is  determinable 
[12].  The  earliest  iteration  of  the  work  required  the  cell  phone  to  be  plugged  in  and 
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laid  down  order  to  sense  events,  this  allowed  for  plenty  of  computational  resources 
as  well  as  a  stable  setting  where  human  interaction  would  have  minimal  impact  on 
sensing.  Experiments  comparing  earthquake  acceleration  values  against  the  standard 
deviation  of  resting  sensors  have  revealed  that  an  earthquake  that  registers  4.0  on  the 
richter  scale  would  be  the  minimum  detectable  by  a  cell  phone  sensor.  As  in  previous 
research,  signal  rotation  is  necessary  to  determine  the  estimated  gravity  components 
in  the  negative  z-axis.  The  picking  algorithm  could  then  be  utilized  to  analyze  live 
data  to  determine  whether  it  is  anomalous.  The  pick  data  would  be  sent  to  the  cloud 
fusion  center  (CFC).  Using  received  picks  and  a  geographic  hashing,  the  CFG  would 
send  heartbeat  messages  to  nearby  phones  to  determine  whether  they  are  active  or 
not.  The  geographic  hashing  would  ascribe  integer  hashing  to  a  grid  of  latitude/lon¬ 
gitude  headings,  and  the  grid  cell  size  would  be  determined  by  the  propagation  rate 
of  seismic  waves  and  the  feature  calculation  window  dictated  by  the  extraction  algo¬ 
rithm.  In  addition,  the  geographic  cells  that  received  picks  would  be  put  into  time 
windowed  buckets  at  the  CFC  for  processing.  With  the  received  picks,  the  location 
of  the  picks  on  the  grid,  and  an  arrival  time  captured,  the  CFC  works  to  probabilis¬ 
tically  determine  whether  an  event  has  taken  place. 


2.11  Identifing  Clusters  of  Importance 

In  a  work  from  2004,  researchers  investigate  the  use  of  location  aware  cell  phones 
and  interactive  clustering  in  the  development  of  a  personal  gazetteer  to  identify  and 
locate  important  destinations  [46].  Zhou,  Frankowski,  Ludford,  Shekhar,  and  Ter- 
veen  identify  an  individual’s  most  important  places  (e.g,  home,  work,  grocery  store, 
etc.).  Zhou  et  al.  developed  an  application  to  capture  user’s  locations  throughout  the 
day;  from  this  set  of  location  data,  an  algorithm  determines  which  data  points  repre- 
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sent  a  cluster  and  thus  indicate  proximity  to  an  important  place.  Non-deterministic 
approaches  such  as  K-Means  clustering  and  deterministic  approaches  such  as  den¬ 
sity  based  clustering  were  both  considered.  A  density-based  deterministic  algorithm 
was  chosen  as  it  allows  cluster  of  arbitrary  size,  robustly  ignores  outliers,  noise,  and 
unusual  points,  and  provided  deterministic  results. 

N(p)  =  {q  e  S\dist(p,q )  <  Eps} 

The  density-based  clustering  algorithm  uses  temporal  pre-processing  techniques  to 
reduce  the  number  of  uninteresting  places  that  are  discovered;  as  such  locations  with 
speeds  greater  than  zero  and  locations  of  close  proximity  to  another  reported  location 
are  discarded,  greatly  reducing  the  amount  of  data.  Additionally,  the  preprocessing 
step  would  aid  in  the  removal  of  frequent  (and  similar)  stop  locations  that  may  exhibit 
inconsistency  in  zero  speed  readings  (and  location  parameters)  such  as  traffic  lights. 
Then  the  density-based  algorithm  can  comb  through  the  spatiotemporal  history  using 
the  time-stamped  location  data  to  discover  the  personal  gazetteer.  When  combing 
through  the  data,  significant  events  can  be  detected  by  the  loss  (or  gaining)  of  GPS 
signals,  as  this  indicates  the  entering  (or  departing  )  a  building  or  similar  structure. 
This  GPS  signal  change  makes  the  use  of  a  clustering  approach  unnecessary  for  the  de¬ 
tections  of  certain  places,  but  to  detect  locations  such  as  parks,  stadiums,  or  sidewalk 
cafe  where  a  GPS  signal  is  constant,  the  density-based  algorithm  proves  necessary. 
Applications  of  this  research  extend  beyond  the  concept  of  personal  gazetteers  and 
into  the  realm  of  partner  matching  for  car  pooling/transportation  needs. 

As  data  clustering  has  presented  itself  as  a  necessary  technique  for  the  recognition 
of  events,  locations,  and  entities,  a  review  of  techniques  is  presented  in  [14],  Clus¬ 
tering  (or  grouping)  of  common  elements  is  accomplished  in  either  an  exploratory  or 
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confirmatory  manner  based  on  either  a  natural  grouping  (identifiable  through  anal¬ 
ysis)  or  goodness-of-fit  (as  a  model  postulates).  Clustering  is  accomplished  through 
a  five  step  process  involving  pattern  representation,  definition  of  a  pattern  proxim¬ 
ity,  the  clustering  or  grouping,  data  abstraction,  and  the  assessment  of  the  output. 
Pattern  representation  refers  to  the  number  of  patterns  identifiable  by  the  clustering 
algorithm. 

Pattern  Representation  — »  Pattern  Proximity  — y  Clustering  — y  Abstraction  — y 

— »  Output  Assessment 

A  set  of  features  is  presented  to  the  algorithm  to  utilize  in  the  identification  of  pat¬ 
terns.  The  selection  of  the  features  is  the  process  of  identifying  the  most  effective 
subset  of  features  to  utilize  in  clustering.  Feature  extraction  is  the  use  of  one  or  more 
transformations  of  the  input  features  to  produce  new  features.  The  use  of  feature 
selection  and/or  feature  extraction  is  often  the  crux  of  most  recognition  research; 
considerable  effort  is  made  to  identify  the  features  sets  that  produce  the  best  results. 
Patterns  can  be  based  on  either  quantitative  or  qualitative  features.  Quantitative  fea¬ 
tures  are  typically  continuous  values,  discrete  values,  or  interval  values.  Qualitative 
features  are  nominal  or  unordered  and  ordinal  values. 

Pattern  Representation  — y  Pattern  Proximity  — >  Clustering  — »  Abstraction  — y 

— »  Output  Assessment 

Pattern  proximity  is  measured  by  a  distance  function  defined  on  pairs  of  patterns. 
Euclidean  distance  is  simply  one  variety  of  distance  measure  used  to  determine  how 
similar  two  patterns  are  to  one  another.  Patterns  that  are  closer  together  share  a 
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higher  likelihood  of  sharing  a  classification  as  compared  to  those  that  are  further 
apart.  Euclidean  distance  can  be  found  by: 

d 

(£  \x,,t  -  xj,kn'/r 

k= 1 

of  which  there  are  a  number  of  different  derivatives  based  on  the  features  being 
compared. 

Pattern  Representation  — >  Pattern  Proximity  — >  Clustering  — >  Abstraction  — y 

— >  Output  Assessment 

Clustering  or  grouping  can  be  performed  in  a  number  of  ways.  In  hard  clusters, 
clusters  are  separated  by  a  partition  whereby  the  data  is  grouped  according  to  some 
common  property.  In  fuzzy  clustering,  clusters  may  vary  and  depend  on  varying  asso¬ 
ciation  with  a  set  of  patterns,  as  clusters  may  share  properties  with  multiple  patterns. 
These  clustering  techniques  can  be  further  categorized  as  hierarchical  or  partitional. 
In  hierarchical  techniques,  algorithms  produce  a  nested  series  of  partitions  based  on 
merging  or  splitting  criterion.  Partitional  clustering  identify  the  partition  that  opti¬ 
mizes  a  particular  criteria  (usually  at  a  local  level).  Additionally,  probabilistic  and 
graph-theoretic  clustering  techniques  are  described  by  P.J.  Flynn  in  section  5  of  his 
work[14]. 

Pattern  Representation  — >  Pattern  Proximity  — >■  Clustering  — >  Abstraction  — » 

— >  Output  Assessment 
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Data  abstraction  is  the  process  of  abstracting  a  simple  and  compact  representation 
of  a  data  set.  Abstraction  is  performed  in  order  to  achieve  efficient  machine  based 
processing  for  output  assessment  or  by  representing  the  data  in  an  easy  to  comprehend 
manner  for  human-oriented  review. 

Pattern  Representation  — »  Pattern  Proximity  — »  Clustering  — >  Abstraction  — > 

— >•  Output  Assessment 

Output  assessment  is  the  processing  of  confirming  cluster  validity.  If  the  output  of  a 
clustering  algorithm  is  unusable,  one  of  the  four  prior  steps  needs  to  be  reimplemented. 

Data  clustering  techniques  can  be  further  broken  down  into  the  following  tax¬ 
onomies  [14]:  agglomerative  vs.  divisive,  monothetic  vs.  polythetic,  hard  vs.  fuzzy, 
deterministic  vs.  stochastic,  and  incremental  vs.  non-incremental.  An  agglomera¬ 
tive  approach  begins  with  each  pattern  in  a  distinct  cluster  and  successively  merges 
clusters  together  until  a  stopping  criterion  has  been  satisfied.  A  divisive  approach 
begins  with  all  patterns  in  a  single  cluster  and  performs  splitting  until  a  stopping  cri¬ 
terion  has  been  reached.  In  a  monothetic  approach,  the  algorithm  considers  features 
sequentially  to  divide  the  given  collection  of  patterns  by  distance.  A  polythetic  ap¬ 
proach  is  where  all  the  features  available  to  an  algorithm  enter  into  the  computation 
of  distance  between  patterns.  The  polythetic  approach  is  used  far  more  often  than 
monothetic  since  the  overall  distance  measured  in  monothetic  will  vary  according  to 
the  order  of  feature  comparison.  Hard  and  fuzzy  techniques  were  described  in  the 
previous  paragraph  and  are  related  to  the  degree  of  inclusivity  a  pattern  has  with  a 
classification.  Deterministic  algorithms  use  traditional  algorithms  whereas  stochastic 
algorithms  resort  to  more  randomized  algorithm  such  as  genetic  or  evolutionary  algo- 
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rithms.  An  incremental  approach  evaluates  patterns  one  at  a  time  and  functions  best 
for  small  data  sets  with  a  minimal  number  of  classifications,  thus  a  method  that  em¬ 
ployees  incremental  algorithms  should  work  to  minimize  the  number  of  scans  through 
a  pattern  set,  reduce  the  number  of  patterns  examined,  or  reduce  the  size  of  the  data 
structure  used.  A  non-incremental  approach  is  utilized  when  constraints  on  execution 
time  or  memory  space  affect  the  architecture  of  the  algorithm.  Choosing  the  right  ap¬ 
proach  to  clustering  is  an  important  step  in  recognition  activities  and  is  guided  by  the 
sensors  being  used  and  the  features  abstracted  from  the  data  generated  by  the  sensors. 


2.12  Multimodal  Activity  Recognition 

In  the  research  paper  Comprehensive  Context  Recognizer  Based  on  Multimodal 
Sensors  in  a  Smart-Phone,  Han,  Vinh,  Y.  Lee,  and  S.  Lee  seek  to  fuse  the  optimal 
combinations  of  sensors  together  in  order  to  determine  the  user’s  context  (activity) 
[17].  Using  multiple  sensors,  namely  accelerometer,  audio  (microphone),  and  signal 
(GPS,  WiFi),  Han  et  al.  work  to  increase  both  the  number  of  activities  recognized, 
but  also  the  ability  to  recognize  multiple  activities,  such  as  the  ability  to  recognize 
someone  using  the  cell  phone  to  make  a  call  while  walking.  This  ability  to  recognize 
context  within  context  has  benefits  for  resource  preservation.  As  an  example,  the  sys¬ 
tem  utilized  the  accelerometer  to  detect  transition  points  from  pedestrian  activities  to 
transportation  activities,  and  vice  versa.  When  the  accelerometer  detects  transporta¬ 
tion,  the  WiFi  receiver  may  be  asked  to  identify  private  WiFi  connections  which  will 
be  far  more  common  on  a  bus  than  a  subway.  For  instance  where  the  accelerometer 
is  not  able  to  provide  the  fine-grained  detail  needed  in  this  study,  the  audio  classifier 
would  be  enabled  to  further  classify  a  transportation  activity.  Additionally,  by  iden¬ 
tifying  the  feature  sets  best  able  to  identify  activities,  sensors  that  generate  data  for 
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unused  feature  sets  can  be  disabled  till  the  a  context  change  is  detected. 

Utilizing  multimodal  sensors  requires  a  balance  in  classifier  selection.  In  some 
instances  where  the  data  streams  and  features  are  similar,  such  as  accelerometer,  gy¬ 
roscope,  and  magnetometer  data,  the  same  classifier  can  be  chosen.  In  cases  where 
dissimilar  data  output  sensors  are  chosen,  such  as  accelerometer  and  microphone, 
multiple  classifiers  will  be  required.  In  Han’s  et  al.  research  into  multimodal  sensors, 
they  chose  a  Gaussian  Mixture  Model  (GMM)  and  a  Hidden  Markov  Model  (HMM) 
for  the  accelerometer  and  audio  classifiers,  respectively  [17].  The  GMM  allows  for  the 
use  of  multiple  dimensions  of  features  where  there  may  be  multiple  distributions  of 
the  data  represented.  The  HMM  was  chosen  for  the  audio  classifier  as  there  are  only 
two  audio  signature  being  detected  and  distinguished  between,  the  bus  and  subway. 
Unlike  the  super  fine-grained  accelerometer  approach  to  recognizing  the  differences 
between  acceleration  and  deceleration  patterns  in  buses  and  subways  demonstrated 
in  [18],  Han  et  al.  utilize  a  more  coarse  classifier  that  activates  the  audio  classifier 
when  more  fine-grained  detail  is  required.  This  difference  in  approaches  demonstrates 
the  flexibility  present  in  the  suite  of  sensors  available  in  cell  phones.  The  features 
extracted  for  activity  recognition  vary  among  the  research.  The  best  features  are 
selected  from  the  following  features:  standard  deviation,  mean  crossing  rate,  Pearson 
correlation  coefficients,  frequency  domain  features,  and  linear  predictive  coding  fea¬ 
tures  to  name  a  few.  Due  to  the  ’curse  of  dimensionality’,  using  all  available  features 
would  not  necessarily  result  in  a  more  accurate  recognition  of  activity,  as  such  it  is 
prudent  to  select  the  best  features.  Han  et  al.  have  built  an  algorithm  that  seeks  to 
select  features  based  on  two  qualities:  the  first  being  the  relevancy  of  the  feature  (or 
the  classification  power)  and  the  second  being  the  redundancy  of  the  feature  (or  the 
similarity  of  two  features).  Once  the  relevance  and  redundancy  have  been  calculated 
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for  each  feature,  a  greedy  forwarding  search  technique  is  applied  to  selectively  extend 
the  feature  set  for  inclusion  into  the  classifier  suitable  for  that  sensor’s  data. 

In  the  research  Preprocessing  Techniques  for  Context  Recognition  from  Accelerom¬ 
eter  Data,  Figo,  Diniz,  Ferreira,  and  Cardoso  provide  additional  scenarios  where  the 
use  of  activity  (context)  recognition  proves  useful  [13].  Additionally  an  overview  of 
numerous  features  is  discussed  in  detail  for  the  time,  frequency,  and  discrete  rep¬ 
resentation  domains.  An  addition  to  the  concept  of  recognition  activity,  Figo  et  al. 
advocate  that  by  analyzing  an  individual’s  activities  over  the  course  of  days,  weeks,  or 
months,  a  more  interactive  experience  can  be  offered  to  users.  A  couple  of  instances 
are  the  ability  to  aid  the  elderly  and  the  ability  to  offer  value-added  information.  In 
the  case  of  elderly  aid,  if  an  awareness  of  a  user’s  activity  could  correlate  an  abrupt 
change  as  a  potential  red-flag,  such  as  an  elderly  individual  taking  a  fall,  it  is  conceiv¬ 
able  that  an  emergency  service  could  more  easily  and  accurately  be  made  aware  of 
the  situation.  As  a  value-added  situation,  consider  the  case  of  an  activity  recognizer 
that  knows  an  individual  runs  at  a  certain  time  of  day  or  on  a  particular  route.  If  the 
activity  recognizer  can  correlate  this  information  with  a  weather  forecast  or  traffic 
construction,  it  is  conceivable  that  the  user  could  receive  suggestions  to  alter  their 
time  or  route. 


2.13  Attribute  Selection 

In  order  to  offer  value-added  services  to  an  user,  it  is  necessary  to  possess  the  ability 
to  acquire,  manage,  process,  and  obtain  useful  information  from  the  raw  sensor  data. 
From  this  sensor  data,  devices  must  be  able  to  accurately  discover  the  characteristics 
or  features  of  the  signal  coming  from  the  sensor.  Figo  et  al.  discuss  the  layered 
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architecture  responsible  for  this  task: 


Sensor  Data  — »  Preprocessing  — >  Sensor  State  — >  Classification  — >  User  Context  — > 

— >  Applications 

Within  the  preprocessing  layer  there  are  effectively  two  layers:  the  base  layer  that 
determines  whether  there  is  a  specific  short-term  context  or  state,  and  the  base-level 
classifier  to  determine  the  type  of  activity  being  performed.  The  preprocessing  is 
split  as  it  is  easier  to  identify  a  short-term  context  such  as  the  absence  of  light  or  the 
presence  of  a  quick  movement  (like  a  fall)  and  it  is  more  computational  intensive  to 
accurately  identify  and  classify  a  specific  type  of  exercise.  In  processing  the  signals 
for  features,  Figo  et  al.  explore  features  related  to  the  time,  frequency,  and  what  they 
call  the  discrete  representation  domains  [13]. 

Time  domain  features  are  those  that  are  derived  via  simple  mathematical  and  stat¬ 
ical  metrics  from  the  raw  sensor  data.  These  techniques  compute  features  from  the 
sensor  data  according  to  some  determined  time  window.  The  most  common  features 
available  in  the  time  domain  are  the  mean,  median,  variance,  standard  deviation, 
min,  max,  range,  RMS,  correlation,  cross-correlation,  and  the  integration  features. 
The  mean  is  calculated  over  some  window  and  is  typically  used  to  determine  a  user’s 
posture  and  whether  an  activity  type  is  static  or  dynamic.  The  mean  is  also  used 
as  a  preprocessing  component  as  knowing  the  mean  aids  in  the  removal  of  random 
spikes  and  noise,  smoothing  the  overall  dataset.  The  median  is  utilized  to  replace 
missing  values.  The  variance  is  the  average  of  the  squared  differences  from  the  mean 
and  is  utilized  where  a  threshold  is  required  for  classification.  The  standard  devi¬ 
ation  is  the  square  root  of  the  variance  and  represents  both  the  variability  of  the 
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dataset  and  a  probability  distribution.  The  standard  deviation  is  an  indication  of  the 
stability  of  a  signal,  however,  it  becomes  less  useful  if  spurious  values  are  included. 
Taken  together  the  variance  and  standard  deviation  are  often  used  as  a  signal  feature 
to  infer  user  movement.  The  range  can  be  use  with  other  indicators  to  distinguish 
between  similar  activities,  such  as  running  and  walking,  that  will  differ  in  amplitude. 
The  RMS  is  used  to  classify  wavelet  results  such  as  those  identifiable  in  walking 
and  biking,  additionally  the  RMS  has  proven  useful  as  an  input  for  neural  networks. 
The  integration  metric  measures  the  signal  area  under  the  curve  to  obtain  speed, 
distance,  and  in  conjunction  with  the  RMS  signal,  the  ability  to  calculate  the  angular 
velocity  from  the  gyroscope.  The  signal  correlation  is  used  to  measure  the  strength 
and  direction  of  a  linear  relationship  between  two  signals.  The  correlation  is  useful  for 
differentiating  between  two  activities  that  involve  translation  into  a  single  dimension. 
The  degree  of  correlation  requires  calculating  the  correlation  coefficient  and  is  used  to 
determine  which  classifiers  are  the  best  for  recognizing  activities.  Cross-correlation 
is  the  measure  of  the  similarity  between  two  waveforms  and  is  used  to  search  for  a 
known  pattern  in  a  long  signal. 

Additional  time  domain  features  are  the  differences,  angular  velocity,  zero-crossings, 
Signal  Magnitude  Area  (SMA),  Signal  Vector  Magnitude  (SVM),  and  the  Differential 
Signal  Vector  Magnitude  (DSVM).  Sample  differences  allow  for  the  basic  compari¬ 
son  between  the  intensity  of  user  activity  when  arranged  pairwise.  Zero-crossings 
are  the  points  where  a  signal  passes  through  a  specific  value  corresponding  to  half  of 
the  signal  range  and  are  used  for  recognition  of  step  movements  and  the  detection  of 
appropriate  timing  for  the  application  of  other  techniques.  Zero-crossings  are  used 
in  conjunction  with  HMM  to  detect  complex  human  gestures.  Angle  and  angular 
velocity  are  used  for  detection  of  user  orientation  and  has  proven  useful  for  fall  de- 
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tection  as  well  as  location  detection  through  gyroscopic  means  [13].  SMA  are  used 
to  compute  the  energy  expenditure  during  periods  of  activity.  Additionally,  SMA 
can  be  used  to  distinguish  between  resting  and  user  activity.  SMA  is  often  used  in 
conjunction  with  SVM  to  identify  possible  falls  and  classify  behavior  patterns  and 
with  DVSM  for  dynamic  activity  recognition  using  thresholds  and  single  metrics. 

Frequency  domain  features  are  used  to  capture  the  repetitive  nature  of  a  sensor 
signal.  The  repetition  often  correlates  to  the  periodic  nature  of  a  specific  activ¬ 
ity.  Commonly  used  frequency  domain  features  are  generated  from  the  Fast  Fourier 
Transform  (FFT)  and  the  Fast  Time  Fourier  Transform  (FTFT).  Frequency  domain 
features  are  the  DC  component,  spectral  energy,  information  entropy,  spectral  analy¬ 
sis  of  coefficients,  wavelet  analysis,  and  symbolic  string  domain  analysis.  Using  FFT 
it  is  possible  to  derive  frequency  domain  features  similar  to  those  obtained  in  the  time 
domain,  such  as  averages  and  dominant  frequency  components.  Using  the  FFT,  the 
DC  component  is  generated  and  co-utilized  with  other  signal  characteristics  to  de¬ 
termine  activity.  Spectral  energy  is  the  energy  of  a  signal  and  is  used  during  single 
axis  accelerometer  activity  recognition,  and  during  operations  to  determine  context 
through  audio  recording.  Information  entropy  helps  to  differentiate  between  sig¬ 
nals  that  have  similar  energy  values  but  correspond  to  different  activity  patterns. 
Together  with  the  mean,  energy,  and  correlation,  information  entropy  has  been  used 
to  classify  activities  that  contain  similar  energy  levels.  Spectral  analysis  of  spe¬ 
cific  coefficients  has  been  used  to  aid  in  activity  recognition.  Using  the  coefficient  of 
magnitude  and  frequency  peaks  within  specified  frequency  ranges,  the  determination 
of  step  rates  has  been  accomplished  via  spectral  analysis.  Wavelet  analysis  can  be 
used  to  examine  the  time-frequency  characteristics  of  a  signal.  Wavelet  analysis  has 
been  used  to  differentiate  and  then  classify  activities  that  are  similar  such  as  horizon- 
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tal  walking  versus  stair  climbing.  Transformation  into  the  symbolic  string  domain 
is  used  to  map  signals  to  strings  for  matching  purposes;  it  is  used  to  evaluate  string 
similarity  and  thus  find  known  patterns.  In  order  to  facilitate  the  recognition  and 
classification  of  symbolic  strings,  there  are  three  distance  formulas  used  to  compute 
the  distance  (or  similarity)  of  strings.  Euclidean  distances  are  found  via: 


n 


N 


u\)2 


and  are  used  as  a  distance  between  symbols.  The  Levenshtein  edit  distance  allows 
for  the  determination  of  a  signal  (as  a  part  of  a  set  of  possible  signals  represented  as 
symbols)  to  determine  which  is  the  closest.  The  Levenshtein  edit  distance  is  found 
via  in  dynamic  programming: 


d(i,j)  =  min{d(i  —  1  ,j)  +  insert,  d(i,j  —  1)  +  insert,  d(i  —  1,  j  —  1)  +  subs{i,j)} 

where  m  and  n  are  the  length  of  two  strings  and  d  is  a  m  x  n  table  which  is  initialized 
with  the  costs  of  creating  the  input  strings.  The  last  distance  formula  is  the  Dynamic 
Time  Warping  (DTW)  process  that  is  used  to  measure  the  similarity  between  two 
sequences  that  may  vary  in  length,  can  thus  correspond  to  different  time  basis.  This 
DTW  approach  involves  Ending  the  mapping  W,  where  in  some  case  an  element  of 
one  string  can  map  to  sequence  of  consecutive  elements  in  another  string: 

1  A' 

mini—  '  ^  114} 

k= 1 

where  the  cost  of  the  post  through  the  cost  matrix  is  found  using  dynamic  program¬ 
ming. 
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Figo  et  al.  reviewed  the  suitability  of  implementing  the  above  features  from  a 
quantitative  and  qualitative  approach  [13].  Quantitatively  they  analyzed  the  com¬ 
plexity  of  implementation,  computational  complexity,  memory  requirements,  and  pre¬ 
cision.  Qualitatively  they  analyzed  the  suitability  for  inclusion  on  a  mobile  device 
(cell  phone)  based  on  the  results  of  the  quantitative  analysis.  Experimental  analy¬ 
sis  was  performed  to  determine  the  best  methods  to  differentiate  between  walking, 
running,  and  jumping.  The  highest  accuracy  of  activity  recognition  was  found  to  be 
generated  via  features  from  the  time  domain.  From  the  frequency  domain,  coefficient 
sum  and  energy  exhibited  the  absolute  highest  accuracy,  but  not  high  enough  consid¬ 
ering  the  complexity  of  implementation  and  computational  cost.  When  all  was  said 
and  done,  Figo  et  al.  found  that  the  computational  simplicity  of  time  domain  features 
indicated  all  would  be  suitable  except  the  correlation  and  cross-correlation  features. 
Figo  et  al.  found  the  opposite  to  be  true  for  the  frequency  domain  features.  Due 
to  computational  cost,  only  wavelet  analysis  and  the  string  domain  distance  Ending 
metric  of  euclidean  distance  proved  suitable  for  mobile  devices.  Figo  et  al.  have 
provided  a  comprehensive  analysis  of  numerous  features  being  utilized  in  the  held  of 
activity  recognition. 

Research  into  how  to  select  the  best  set  of  features  in  A  Novel  Feature  Selection 
Method  Based  on  Normalized  Mutual  Information,  indicates  validity  to  using  either 
the  max-relevance  minimum  redundancy  approach  (mRMR)  or  the  Normalized  Mu¬ 
tual  Information  Feature  Selection  (NMIFS)  algorithms  to  incorporating  the  most 
appropriate  set  of  features  in  an  activity  recognition  model[43].  As  noted  previously 
in  slightly  different  terminology,  Vinh,  Lee,  Park,  and  DAuriol  define  the  concept  of 
feature  extraction  as  the  process  of  generating  new  features  by  projecting  the  origi¬ 
nal  feature  space  into  a  reduced- dimension  space.  Feature  selection  is  defined  as  the 
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technique  for  selecting  a  subset  of  relevant  features,  which  contain  information  helpful 
to  distinguishing  one  classification  from  another.  Feature  selection  utilizes  the  con¬ 
cepts  of  a  wrapper,  embedding,  and  filtering.  Wrapper  approaches  make  use  of  the 
classification  accuracy  to  evaluate  the  usefulness  of  features  at  each  step.  Vinh  et  al. 
found  that  the  need  to  repeatedly  train  wrapper  based  approaches  are  computation¬ 
ally  expensive  and  thus  impractical  to  utilize  for  large  datasets.  Embedded  methods 
of  feature  selection  use  particular  classifiers  to  find  feature  sets.  Embedded  methods 
select  features  in  their  training  phase,  but  their  ability  to  use  a  cost  function  during 
the  feature  selection  process  makes  them  faster  than  a  wrapper  approach.  Filter  al¬ 
gorithms  utilize  simple  measurements  such  as  correlation  to  estimate  the  goodness  of 
features,  as  a  result,  filter  methods  are  fast  and  effective.  Filter  algorithms  seek  to 
find  the  subset  of  features  that  maximizes  the  following: 

Ps  = 

\Jk  +  k(k  —  l)rJJ 

Where  S  is  a  subset  of  k  features,  Rcf  is  the  mean  feature  class  correlation  (/  e  S) 
and  Tff  is  the  average  feature  inter-correlation.  Rcf  and  rJJ  are  calculated  similarly 
through: 

E[(x  -  px)(y  -  ny)\ 

^  xy 

G x® y 

where  fi  and  o  represent  the  mean  and  standard  deviation  respectively.  Vinh  et  al. 
note  that  the  filter  method  is  not  able  to  describe  non-linear  relationships  among 
variables  where  correlation  is  difficult  to  establish.  In  addition,  the  computation  re¬ 
quires  that  all  of  the  features  be  numerical  values,  thus  the  desire  to  normalize  the 
information  for  comparative  computation.  In  order  to  allow  as  wide  a  set  of  candidate 
features  to  be  evaluated  using  a  filter  technique  such  as  nRMR  or  NMIFS,  Vinh  et 
al.  propose  to  quantize  all  data  prior  to  evaluation.  The  quantization  algorithm  en- 
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sures  that  N  levels  of  data  are  quantized  for  each  feature  requiring  quantization.  The 
proposed  methodology  uses  the  normalization  of  mutual  information  and  the  feature 
independent  normalized  weights  to  perform  the  quantization  and  would  be  limited 
strictly  to  the  selection  of  features  criterion  only.  The  process  of  quantizing  the  data 
for  comparison  of  feature  sets  is  done  in  as  computationally  simple  as  manner  as 
possible  to  limit  the  utilization  of  system  resources. 

The  use  of  low  resource  algorithms  that  use  easily  computed  statistical  attributes 
such  as  range,  standard  deviation,  variance,  skewness,  kurtosis,  and  root-mean-squared 
have  been  shown  to  be  quite  useful  when  recognizing  and  classifying  activities  [18]. 
It  remains  to  be  seen  whether  such  statistical  attributes  are  equally  as  effective  when 
classifying  a  variety  of  non-activity  based  entities.  Additionally,  whether  the  identi¬ 
fication  methods  can  identify  not  just  entity  categories  but  also  sub-categories  where 
classification  between  two  of  the  same  entities  operating  at  different  modes  or  frequen¬ 
cies  is  possible.  In  a  possible  complication  offered  by  entity  sensing,  the  appearance 
of  non-periodic  entities  could  pose  a  problem  for  statistical  entity  classification. 

While  activities  such  as  running,  biking,  and  riding  a  subway  may  offer  periods 
in  time  where  the  activity  appears  non-periodic,  by  and  large  their  sensor  output 
(accelerometer  and  gyroscope)  will  exhibit  periodic  functions.  When  sensing  entities, 
it  is  conceivable  that  entities  may  display  similar  characteristics  when  analyzed  sta¬ 
tistically.  The  concept  of  using  a  smart  phone  to  scan  the  undercarriage  of  vehicles 
that  pass  overhead  may  generate  a  magnetic  signature  that  when  viewed  as  a  series  of 
ridges  and  troughs  may  be  unique  between  vehicles,  but  when  statistically  analyzed 
the  results  could  be  too  similar  for  accurate  identification.  As  such  it  is  necessary  to 
have  additional  tools  by  which  to  differentiate  data,  this  is  where  wavelets  may  offer 
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additional  resolution. 


2.14  Resource  Preservation 

In  an  effort  to  support  continuous  sensing,  Lu  et  al.  of  The  Jigsaw  Continuous  Sens¬ 
ing  Engine  for  Mobile  Phone  Applications,  propose  a  methodology  which  strives  to 
balance  the  resource  demands  of  long-term  sensing,  inference  (recognition),  and  com¬ 
munications  algorithms  [28].  Lu’s  et  al.  jigsaw  algorithm,  as  proposed,  preserves  the 
resilience  of  the  accelerometer  data  processing  regardless  of  phone  platform,  place¬ 
ment,  or  orientation.  Jigsaw  implements  smart  admission  control  and  on-demand 
processing  for  the  microphone  and  accelerometer  data;  admission  control  and  on- 
demand  processing  allow  for  adaptive  throttling  of  the  depth  and  sophistication  of 
sensing  pipelines  when  the  input  data  is  low  quality  or  uninformative.  Adaptive 
pipeline  processing  allows  for  judicious  triggering  of  power  hungry  pipeline  stages 
when  appropriate  and  takes  into  account  the  mobility  and  behavioral  patterns  of  the 
user  to  drive  down  energy  costs.  Additionally,  their  platform  implements  the  con¬ 
cept  of  robust  classifiers  explored  by  [35]  that  allows  for  different  sensors  in  different 
placement  positions  to  accurately  recognize  activity.  Additionally,  as  noted  in  pre¬ 
vious  studies,  different  sensors  have  different  processing  costs  associated  with  their 
unique  sampling  rates,  features  sets,  and  other  performance  characteristics,  Jigsaw 
tries  to  optimally  balance  the  functions  responsible  for  sensing  with  available  com¬ 
puting  resources. 

The  Jigsaw  platform  was  developed  to  utilize  sensor-specific  pipelines  to  process 
data  from  specific  sensors  when  performing  continuous  monitoring;  additionally  it 
is  optimized  and  able  to  run  completely  on  the  phone  without  a  server  requirement 
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[28].  The  accelerometer  pipeline  is  designed  based  on  the  fact  that  sensor  sampling 
is  not  resource  prohibitive  and  merely  requires  a  robust  set  of  inferences.  As  such, 
accelerometer  calibration  techniques  via  a  one-time  user-transparent  process  are  pro¬ 
posed,  classification  of  activities  via  sub-classification  for  independence  from  sensor 
placement  is  discussed,  and  the  filtering  of  extraneous  activities  and  movement  is  ex¬ 
plained.  For  accelerometer  data,  misclassification  is  most  pronounced  during  periods 
of  activity  overlap,  such  as  answering  a  phone  while  riding.  The  features  selected 
for  classification  would  be  among  the  following  set:  mean,  variance,  mean  crossing 
rate,  spectrum  peak,  sub-band  energy,  sub-band  energy  ration,  and  spectral  energy. 
Such  extraneous  activity  recognition  can  be  countered  by  recognizing  periods  of  user 
interaction  (phone  calls,  texting)  and  by  recognizing  transitional  states  (standing  up, 
act  of  picking  phone  up).  The  sub-classification  of  recognizable  activities  is  enhanced 
by  the  use  of  orientation  independent  features  as  much  as  possible.  In  addition  to 
orientation  independent  features,  sub-classification  of  activities  allows  for  the  recog¬ 
nition  of  activities  regardless  of  body  placement  of  the  cell  phone. 

In  the  use  of  a  microphone,  resource  consumption,  such  as  memory,  computation, 
and  energy  usage,  are  high.  Features  computed  for  audio  data  are:  spectral  rolloff, 
spectral  flux,  bandwidth,  spectral  centroid,  relative  spectral  entropy,  low  energy  frame 
rate,  and  13  other  coefficient  features.  The  microphone  pipeline  utilizes  the  concept 
of  admission  control  and  a  duty  cycle  component  to  regulate  the  amount  of  data  that 
enters  its’  pipeline.  When  the  microphone  has  detected  a  sound  (signal)  that  doesn’t 
change  for  some  window  (period  of  time),  the  microphone  will  save  resources  and  not 
perform  redundant  classification.  Additionally,  to  save  computation  resources,  the 
microphone  pipeline  will  short  circuit  the  process  for  common  but  distinctive  sound 
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classes  [28]. 


A  GPS  pipeline  is  optimized  to  learn  user  activities  to  budget  energy  as  judiciously 
as  possible.  Energy  is  preserved  by  recognizing  prior  activity  trends  and  working  to 
pre-calculate  duration  to  ensure  availability  of  resources  when  required.  The  GPS 
uses  a  Markov  Decision  Process  (MDP)  to  learn  an  adaptive  switching  schedule  for 
resource  consumption.  In  addition,  through  fusion  with  the  low-resource  consuming 
accelerometer,  the  GPS  is  able  to  determine  additional  opportunities  to  turn  off  or 
on.  Lastly,  not  all  applications  will  require  constant  sensing,  as  such  the  GPS  pipeline 
can  be  tailored  to  the  context  being  classified. 

An  example  pipeline  would  look  similar  to: 

Raw  Data  — y  Preprocessing  — y  Feature  Extraction  — »  Activity  Classification  — y 

— »  Smoothing 

Preprocessing  would  consist  of: 

Framing  — >  Normalization  — >  Admission  Control  — >  Projection 
Feature  extraction  consists  of  the  feature  vector.  Activity  classification  consists  of: 

Activity  Classifier  — >•  Output  Merging 

and  the  smoothing  process  consists  of  a  smoothing  algorithm  that  performs  a  simple 
moving  average  on  consecutive  data  points  in  order  to  minimize  the  effect  of  outliers. 
There  are  variations  between  the  accelerometer,  microphone,  and  GPS  pipelines;  the 
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above  model  captures  the  pertinent  components  of  each  pipeline  [28].  Through  the  ac¬ 
celerometer,  microphone,  and  GPS,  and  their  associated  pipelines,  a  set  of  inferences 
and  locations  is  analyzed  for  the  task  of  activity  recognition.  The  use  of  pipelines  to 
save  on  computational  resources,  and  the  addition  of  sub-classification  of  activities 
based  on  phone  position,  results  in  a  sensing  platform  that  places  no  burden  on  the 
user  in  terms  of  calibration,  placement,  orientation,  or  awareness  of  application  acti¬ 
vation  and  deactivation. 

Through  a  review  of  the  relevant  literature,  the  evolution  of  activity  recognition 
can  be  seen  to  have  progressed  from  simple,  single  sensor  techniques  that  differenti¬ 
ated  between  a  few  activities  to  multi-modal  systems  that  fuse  sensor  data  to  detect 
numerous  pedestrian  and  motorized  transportation  avenues.  Additional  research  has 
proposed  value-added  applications  to  the  concept  of  activity  recognition,  offering  ad¬ 
ditional  services  or  information  depending  on  the  activity  being  performed.  Research 
into  events  completely  external  to  the  device  and  unrelated  to  users,  such  as  earth¬ 
quake  detection,  has  revealed  the  utility  of  the  cell  phone  to  be  a  sensor  of  more 
than  just  user  activity.  The  recognition  of  ever  more  activities  and  entities  would 
extend  the  evolution  of  the  capability  of  cell  phones  to  recognize  external  events. 
Growing  the  ability  of  the  cell  phone  to  recognize  more  requires  balancing  the  acti¬ 
vation  of  available  sensors  with  resources,  selecting  the  best  features  for  classifying 
specific  problems,  and  preserving  the  cell  phone’s  normal  functions  while  observing, 
recognizing,  and  classifying. 
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III.  Methodology  -  Recognition  and  Identification 


3.1  Experiment  Objectives 

In  designing  the  experiment  explained  within  below,  the  goal  is  to  determine 
whether  the  sensors  in  a  cell  phone  are  accurate  enough  to  detect  and  identify  envi¬ 
ronmental  actors  external  to  the  cell  phone,  referenced  as  entity  recognition.  Entity 
recognition  will  allow  the  cell  phone  to  peer  beyond  the  scope  of  identifying  and  clas¬ 
sifying  which  activity  a  phone  user  is  partaking.  If  successful,  entity  recognition  will 
allow  the  cell  phone  with  its’  environmental  sensors  and  the  requisite  algorithms  to 
become  entity  aware.  With  a  large  enough  set  of  attributes,  entity  signatures,  and 
cell  phone  location  awareness,  the  ability  to  recognize  and  classify  entities  presents 
researchers  and  analysts  with  a  wealth  of  data. 

As  noted  in  the  earthquake  research  of  Faulkner  [12,  11]  and  the  gamma  ray  de¬ 
tection  of  Cogliati  [6],  the  concept  of  using  a  cell  phone’s  sensors  to  evaluate  the 
environment  a  user  is  in  is  gaining  popularity.  Beyond  using  location  data  to  identify 
crowds  and/or  flocks  [24],  the  researchers  utilize  a  multi-modal  approach  and  cap¬ 
ture  accelerometer  and  gyroscope  data  to  ascertain  the  likelihood  an  earthquake  took 
place.  In  addition  to  monitoring  CMOS  for  strikes  indicative  of  gamma  ray  photon 
emissions,  Cogliati’s  multi-modal  approach  utilizes  GPS  and  accelerometer  data  to 
determine  whether  a  user  is  at  an  airport,  and  then  to  identify  whether  they  are 
taking  off  or  landing. 

In  determining  whether  there  is  value  to  a  multi-modal  approach  to  analyzing  the 
environment  for  entities,  an  experiment  has  been  designed  to  collect  and  analyze  data 
from  several  entities,  both  disparate  and  similar.  The  data  will  be  gathered  by  the 
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SensorSuite  program  written  in  the  iOS  native  language  of  Swift  and  designed  specif¬ 
ically  for  the  purpose  of  gathering  raw  data  directly  from  the  sensors  in  the  iPhone  5. 
Once  the  data  is  gathered,  it  will  be  analyzed  to  determine  whether  an  entity  detec¬ 
tion  is  possible.  If  the  results  indicate  it  is,  it  will  be  further  analyzed  to  determine 
whether  there  is  value  added  to  a  multi-modal  approach  versus  using  a  single  sensor. 

The  experiment  captures  the  conditions  affecting  the  sensors  in  the  cell  phone  and 
reveals  details  about  the  environment  in  regard  to  magnetic  held  structure  and  fluc¬ 
tuations  (magnetometer),  gravitational  changes  due  to  movement  affects  (accelerom¬ 
eter),  and  torque  and  inertial  affects  (gyroscope).  In  addition  to  entities  creating 
conditions  that  affect  a  specific  sensor  in  a  straightforward  manner,  such  as  the  mag¬ 
netic  held  being  detectable  by  the  magnetometer,  it  may  be  possible  to  detect  the 
held  (and  thus  an  entity)  by  forces  exerted  on  the  gyroscope.  In  the  same  vane, 
it  may  be  that  vibrations  detectable  by  the  accelerometer  and  gyroscope  based  on 
minute  changes  in  the  cell  phones  orientation  could  also  affect  the  readings  from  the 
magnetometer,  as  its’  location  relative  to  a  specihc  spot  in  the  magnetic  held  may 
shift. 

It  is  not  known  whether  a  cell  phone’s  sensors  offer  the  hdelity  necessary  to  ’dis¬ 
cover’  and  ’identify’  an  entity  in  the  environment.  Thus,  sensor  data  from  the  cell 
phone  sensors  will  be  captured  and  analyzed  to  determine  whether  an  entity  has  had 
detectable  affects  on  the  sensor.  Apart  from  the  cell  phone,  it  is  known  that  entities 
produce  environmental  effects  that  are  measurable  and  detectable  via  legacy  devices 
purpose  built  to  sense  a  specihc  effect  or  entity  (i.e.  seismometers  for  earthquakes, 
gaussmeters  for  measuring  magnetic  helds).  What  requires  investigation  is  whether 
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a  multi-modal  approach  to  entity  detection  can  augment  or  replace  legacy  devices  in 
detection  paradigms. 

3.2  Experiment  Methodology 

The  experiment  built  to  determine  whether  a  multi-modal  approach  to  sensing 
entities  is  obtainable  involves  3  distinct  groups  of  control  variables  for  2  separate  ex¬ 
periments.  The  first  experiment  involves  capturing  data  from  the  fused  sensor  package 
from  the  environmental  effects  induced  by  microwave  ovens  and  subwoofers.  The  sec¬ 
ond  experiment  involves  recording  the  sensor  signature  readable  from  scanning  the 
environmental  attributes  produced  by  the  undercarriage  of  a  vehicle  passing  overhead 
the  recording  device.  It  is  believed  that  the  microwave  oven,  active  subwoofer,  and  the 
vehicle  will  each  produce  a  magnetic  field  detectable  by  the  cell  phone;  it  is  unknown 
what  effect  these  devices  may  have  on  the  accelerometer  and  gyroscope.  Capturing 
the  raw  data  from  each  of  the  three  sensors  will  allow  analysis  to  determine  whether  a 
single  sensor  stream  is  acceptable  for  determining  which  entity  is  acting  on  the  sensors 
or  does  accurate  recognition  require  multiple  sensors  to  determine  the  classification 
of  the  entity.  Which  entities  produce  statistically  significant  affects  on  a  particular 
sensor  beyond  a  baseline  reading  where  there  is  no  actors  save  the  planetary  and 
structural  effects  present  in  the  test  environment?  If  classification  to  a  specific  entity 
isn’t  possible,  is  it  at  least  possible  to  get  down  to  the  correct  category?  In  order  to 
verify  the  ability  to  classify  an  entity  this  experiment  acquires  the  raw  environmental 
attribute  readings  necessary  to  determine  the  level  of  prediction  possible. 

3.3  Experiment  Boundaries 

The  sensors  in  a  cell  phone  are  regularly  used  to  determine  location,  phone  ori¬ 
entation,  motion  during  app  usage,  and  the  particular  activity  a  user  may  be  en- 
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gaged  in,  via  a  combination  of  data  streams  from  the  gyroscope,  accelerometer,  data 
communications  chipsets  (LTE,  WiFi,  Bluetooth,  etc)  and  GPS.  In  addition,  the 
sensors  (accelerometer  and  GPS)  have  proven  useful  to  detecting  the  presence  and 
non-presence  of  earthquakes  [12,  11],  In  theory,  it  may  be  possible  to  use  the  sen¬ 
sor  data  to  determine  the  presence  of  a  large  number  of  number  of  entities  in  a  cell 
phone’s  environment.  If  the  data  can  be  combined  and/or  analyzed  in  an  effective 
manner,  there  are  whole  classes  of  legacy  detectors  that  could  be  augmented  and/or 
replaced. 

The  experiment,  as  devised,  will  measure  the  gravitational,  inertial,  and  magnetic 
effects  an  entity  produces  in  an  environment  that  are  measurable  by  the  sensors  res¬ 
ident  in  a  cell  phone.  These  measurements  will  be  taken  by  the  sensor  within  the 
cell  phone  and  captured  via  the  SensorSuite  logging  software.  The  measurements  will 
record  the  effects  being  read  by  the  sensor  as  it  relates  to  the  environment  attributes 
produced  by  an  entity.  The  attributes  being  read  by  the  sensors  magnetism,  gravity, 
and  inertial  effects,  are  always  present  in  the  environment  and  as  such  will  return  a 
reading.  The  entity  may  alter  the  environmental  attributes,  if  so  the  sensors  within 
a  cell  phone  may  capture  the  changes. 

Each  entity  involved  in  this  experiment  will  be  measured  individually  and  all  rea¬ 
sonable  steps  will  be  taken  to  ensure  there  is  only  one  entity  present  and  active  during 
a  specified  data  logging  session.  This  is  necessary  to  build  a  set  of  data  that  will  allow 
for  the  accurate  identification  of  an  entity. 
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3.4  Experiment  Response  Variables 


The  response  variables  for  this  experiment  (Table  1)  are  the  sensors  within  a  cell 
phone.  Using  an  iPhone  5  as  the  sensor  package,  the  experiment  will  log  the  sensor 
output  from  the  cell  phone’s  accelerometer,  gyroscope,  and  magnetometer.  Additional 
data  from  the  phones  GPS  and  microphone  will  be  captured  for  posterity  as  well,  but 
will  not  be  analyzed  in  the  data  analysis  phase  of  this  research.  The  magnetometer, 
accelerometer,  and  gyroscope  are  each  3-axis  measurement  devices  capable  of  taking 
readings  in  the  x,  y,  and  z-axises.  The  magnetometer  measures  the  magnetic  field,  the 
accelerometer  measures  gravitational  data,  and  the  gyroscope  measure  torque  and  in¬ 
ertial  effects.  Each  sensor  is  silicon  based  and  determines  the  environmental  attribute 
it  is  responsible  for  via  different  means.  Knowing  the  specific  values  captured  and 
output  by  these  sensors  and  how  that  correlates  to  an  entity  and  its’  effect  on  the 
environment  is  not  straightforward.  For  instance,  the  output  from  a  magnetometer 
can  determine  where  magnetic  north  is,  but  the  output  of  its’  sensors  is  not  a  180°  or 
360°  output.  As  such,  additional  understanding  of  the  physics  behind  each  sensor  is 
required  to  fully  interpret  the  data  output.  However,  this  is  not  a  requirement  when 
it  comes  to  capturing  and  analyzing  the  data  for  statistical  significance  between  actors. 

The  response  variables  will  be  represented  in  the  units  native  to  that  sensor.  The 
magnetometer  will  capture  readings  measured  in  p-Tesla,  which  represent  the  mag¬ 
netic  field  experienced  by  the  sensor.  The  accelerometer  will  capture  readings  mea¬ 
sured  in  g  which  represent  the  gravitational  force  being  experienced  by  the  sensor. 
The  gyroscope  will  capture  readings  measured  in  /  which  represent  the  inertial  forces 
being  experiences  by  the  sensor. 
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Table  1.  Response  Variables 


Outputs 

Units 

Measured  Variable 

Magnetometer  (3-axis) 

micro- Tesla  (pT) 

Magnetic  Field 

Accelerometer  (3-axis) 

Gravitational  Units  ( g ) 

Gravity 

Gyroscope  (3-axis) 

Inertial  Momentum  Units  (/) 

Change  in  Momentum 

3.5  Experiment  Control  Variables 

In  order  to  obtain  the  required  environmental  attributes  via  the  response  variables, 
the  experiment  is  setup  with  a  number  of  control  variables  and  held-constant  factors. 
The  control  variables  to  be  used  in  the  experiments  are  the  external  entities.  In 
this  case,  the  experiments  will  take  measurements  of  the  environmental  attributes 
affected  by  a  12”  subwoofer,  microwave  ovens,  and  two  automobiles.  The  experiments 
will  be  conducted  to  determine  which  environmental  attributes  are  affected  by  the 
entities  when  the  entities  are  in  an  operational  status  and  the  cell  phone  is  capturing 
the  attributes  via  its’  sensors.  Non-operational  (baseline)  status  readings  will  be 
captured  as  well  to  register  the  structural  and  geophysical  properties  present  in  the 
test  environment. 

For  experiment  1  detailed  in  Table  2,  the  recording  device  (iPhone  5)  will  be  posi¬ 
tioned  and  oriented  in  the  prescribed  manner  from  each  entity.  For  entities  1  through 
6,  the  device  will  be  positioned  one  inch  from  the  back  of  the  subwoofer  enclosure, 
opposite  of  the  subwoofer;  the  device  is  positioned  approximately  8.1  inches  from 
the  entities  magnet.  The  recording  device  will  be  in  a  head-to-tail  fashion  with  the 
face  of  the  device  pointed  skyward.  Additionally,  the  device  will  rest  on  the  edge  of 
6x3. 5x7.5  inch  block  of  wood,  so  that  the  block  of  wood  is  no  closer  than  one  inch 
from  the  subwoofer  enclosure;  the  wooden  block  is  composed  of  4  identically  sized 
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Table  2.  Control  Variables  -  Experiment  1 


Entity  # 

IDS 

Entity 

Level 

Sub- Level 

Sessions 

1 

d 

12”  Subwoofer 

40Hz 

dB  level  ’A’  a’  b 

36 

2 

f 

12”  Subwoofer 

40Hz 

dB  level  ’B’  a’  c 

32 

3 

e 

12”  Subwoofer 

40Hz 

dB  level  W  a'd 

30 

4 

g 

12”  Subwoofer 

50Hz 

dB  level  ’A’  a’  b 

64 

5 

i 

12”  Subwoofer 

50Hz 

dB  level  ’B’  a’  c 

32 

6 

h 

12”  Subwoofer 

50Hz 

dB  level  ’O’  a’ d 

30 

7 

c 

Microwave  Oven 

1600  Watt 

100%  Power  e 

30 

8 

b 

Microwave  Oven 

1000  Watt 

100%  Power  f 

30 

9 

a 

Microwave  Oven 

1000  Watt 

50%  Power  1 

30 

10 

j 

Baseline 

n/a 

n/a 

40 

JL  Audio  Subwoofer,  12W0v3-4,  in  a  ”3/4” -inch  MDF  enclosure  built  to  manufacturers  specifications 

k  dB  level  on  3  inline-device:  computer  -12dB,  receiver  -lldB,  subwoofer  amplifier  +10  gain 

c  dB  level  on  3  inline-device:  computer  -24dB,  receiver  -lldB,  subwoofer  amplifier  +10  gain 

dB  level  on  3  inline-device:  computer  -24dB,  receiver  -lldB,  subwoofer  amplifier  +5  gain 

e  General  Electric,  model  JES1142SP1SS 
^  Hamilton  Beachm  model  HB-P100N3oAL-S3 
s  WEKA  Confusion  Matrix  ID 


pieces  of  pine  2x4  that  have  been  glued  together.  This  will  point  the  device  at  the  ap¬ 
proximate  middle  of  the  subwoofer  enclosure.  For  entities  7  through  9,  the  device  will 
be  positioned  6  inches  from  the  front  of  the  microwave,  facing  the  microwave  door; 
the  device  is  positioned  approximately  13  inches  from  the  entities  magnetron.  The 
recording  device  will  be  in  a  head-to-tail  fashion  with  the  face  of  the  device  pointed 
skyward.  Additionally,  the  device  will  be  laid  flat  on  the  surface  in  front  of  the  mi¬ 
crowave,  is  this  case  a  basement  floor.  For  entity  10  the  device  will  be  laid  flat  in  the 
same  location  as  the  recording  session  for  the  microwave  with  no  other  entities  present. 


For  entities  1  through  9,  the  recording  session  will  be  started  with  the  entity  in 
the  inactive  position,  once  the  recording  session  is  active,  the  entity  will  be  activated. 
When  the  entity  has  completed  a  cycle  for  its’  prescribed  activity,  the  recording  ses¬ 
sion  will  be  complete  and  terminated.  The  activity  prescribed  to  entities  1  through 
6  is  to  generate  the  prescribed  tone  (Table  2)  at  the  prescribed  dB  level  via  Katsura 
Sharewares  AudioTest  program  (version  2.1.2).  The  wave  type  is  Sine  with  a  100% 
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Table  3.  Control  Variables  -  Experiment  2 


Entity  # 

Entity 

Level 

Sessions 

1 

Vehicle 

Subaru  a 

30 

2 

Vehicle 

Ford  b 

30 

3 

Baseline 

n/a 

30 

2013  Subaru  Crosstreek  XV,  Premium  trim,  CVT  transmission 
b  2013  Ford  F-150  XLT,  Supercab,  4x4,  145”  Wheelbase 


pulse  width  at  a  sample  rate  of  44.1k  for  a  duration  of  3.0  seconds.  The  activity 
prescribed  to  entities  7  through  9  is  to  operate  at  the  prescribed  power  level  (Table  2 
for  time  durations  split  between  either  30  and  60  seconds;  the  device  was  set  to  heat 
a  bowl  of  water.  Each  of  these  entities,  1  through  10,  was  recorded  at  least  30  times. 


For  experiment  2  detailed  in  Table  3,  the  recording  device  will  be  positioned  and 
oriented  in  the  prescribed  manner  from  each  entity.  For  each  entity,  1  through  3,  the 
device  was  placed  on  the  same  block  of  wood  used  in  experiment  1  for  the  subwoofer 
entity  recordings.  With  the  recording  device  in  place  on  top  of  the  block  of  wood, 
entities  1  and  2  were  driven  at  idle  speed  (varying  low  speeds)  over  the  wooden  block 
with  recording  device  atop.  The  vehicle  was  driven  over  the  block  so  that  the  vehicle 
passed  over  in  a  front-to-back  fashion  with  minimal  breaking  and  so  that  the  midline 
of  the  vehicle  was  the  approximate  passover  point  relative  to  the  recording  device.  In 
addition,  the  recording  device  was  placed  so  that  at  the  beginning  of  each  recording 
session  the  top  of  the  device  faced  the  front  of  the  vehicle  and  at  the  end  of  each 
recording  session  the  bottom  of  the  device  faced  the  rear  of  the  vehicle,  the  device 
was  laid  so  the  face  of  the  device  pointed  skyward.  For  entity  3  in  experiment  2,  the 
device  was  laid  flat  on  the  wooden  block  in  a  residential  drive  way  made  of  concrete, 
the  same  location  of  the  entity  2  and  3  recording  sessions. 
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3.6  Experiment  Factors  Held  Constant 


Factors  held-constant  are  the  structural  environments  the  readings  take  place  in;  in 
addition  the  readings  are  all  gathered  at  approximately  the  same  time  thus  limiting 
the  amount  of  change  present  in  potential  atmospheric  actors.  In  each  experiment, 
the  recording  device  will  be  positioned  in  approximately  the  same  position  and  ori¬ 
entation  to  record  the  entities;  the  recording  location  will  be  marked  out  on  the  floor 
in  masking  tape.  Sans  vibrations  that  move  the  recording  device  during  a  recording 
session  where  the  entity  is  active,  the  recording  device  and  entity  will  be  kept  at  the 
distance  indicated  in  Section  3.5. 

In  order  to  minimize  noise  factors  as  much  as  possible,  the  experiment  data  record¬ 
ing  sessions  will  each  have  a  period  of  inactivity  captured  before  and  after  the  entity 
being  put  in  an  active  status,  thus  allowing  for  the  verification  of  normal  baseline 
readings.  The  structural  and  mechanical  noise  will  be  eliminated  as  much  as  pos¬ 
sible.  The  geologic  noise  will  be  baselined  and  should  not  vary  greatly  over  time. 
The  only  unknown  and  uncontrollable,  though  always  present,  will  be  the  amount  of 
noise  from  fluctuations  in  the  Earth’s  magnetic  field.  Taken  together,  the  before  and 
after  baselining  of  a  particular  test  will  allow  for  the  identification  and  reduction  of 
noise  effects  in  the  environmental  attribute  readings.  In  addition,  for  a  sensor  such 
as  the  magnetometer,  it  is  possible  to  expose  the  sensor  to  a  magnetic  held  of  such 
strength  that  the  sensor  requires  a  software  reset  to  re-baseline  itself  and  may  not 
produce  accurate  results  after  exposure  to  a  magnetic  held  of  sufficient  strength.  The 
magnetometer  experiences  sensor  overhow  when  the  sum  of  absolute  values  of  each 
axis  is  >  4912pT  [7]. 

In  order  to  perform  the  experiment,  certain  pieces  of  equipment  will  be  used  to 
ensure  replicability  of  the  testing  environment.  The  foremost  piece  is  the  recording 
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device,  the  iPhone  5,  with  the  required  internal  sensors  (magnetometer,  accelerometer, 
gyroscope,  GPS,  and  microphone),  this  device  will  be  the  same  for  each  recording 
session  and  will  be  verified  by  the  unique  user  identification  code  that  will  output  with 
the  raw  sensor  data.  Other  pieces  of  equipment  will  be  blocks  and  tape  to  outline  the 
testing  locations  for  placement  and  replacement  of  the  entities  and  recording  devices 
if  movement  should  occur  before,  during,  or  after  a  recording  session.  The  distances 
listed  in  subsection  3.5  will  be  verified  with  a  standard  tape  measure  with  marking 
in  both  metric  and  imperial  standards. 

3.7  Experiment  Data  Collection 

The  overall  goal  of  the  experiment  is  to  analyze  the  environmental  attributes  and 
how  they  are  effected  by  specific  entities,  as  such,  to  capture  the  environmental  at¬ 
tribute  output  from  the  cell  phone  sensors  to  determine  whether  the  effects  are  signif¬ 
icant  enough  to  measure  with  the  sensors  in  the  iPhone  5.  The  sensors  will  read  the 
magnetic,  gravitational,  and  inertial  data  being  output  by  their  respective  sensors  in 
the  cell  phone.  These  qualities  will  be  recorded  and  output  to  a  SQLite  database  at 
the  highest  rate  possible.  The  SensorSuite  software  allows  measurements  to  be  cap¬ 
tured  at  rates  between  1  and  100Hz  a  second,  the  maximum  rate  possible  according 
to  the  data  sheets  available  for  the  sensors  within  the  iPhone  5  [7].  However,  the  iOS 
platform  limits  the  sampling  rate  to  approximately  40Hz,  presumably  for  preservation 
of  cell  phone  resources  such  as  CPU  cycles,  bus  speed,  and  battery  levels. 

Each  data  recording  session  will  capture  a  single  active  session  for  a  particular 
entity,  as  such,  there  will  be  at  least  30  recording  sessions  for  each  entity.  The  data 
samples  will  be  output  via  a  comma  separated  value  (CSV)  format  for  processing 
in  the  R  statistical  computing  package.  Various  screens  will  be  run  on  the  record- 
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ing  sessions  to  eliminate  outliers,  trim  off  the  leading  and  trailing  non-active  entity 
sensor  data  packets,  smooth  the  data  when  necessary,  and  to  compute  the  numerous 
attributes  selected  for  analysis.  This  process  will  be  repeatable  and  applied  to  all 
data  sessions  recorded  for  entity  analysis. 

3.8  Methodology  -  Signature  Windows 

After  gathering  the  data  in  the  experiments  listed  previously,  the  sensor  data  was 
exported  from  the  iOS  SQLITE  database  via  the  SensorSuite  program  designed  specif¬ 
ically  for  the  purpose  of  acquiring  data  streams  from  the  sensors  available  on  the 
iPhone  5.  The  raw  data  was  then  imported  into  R  for  statistical  analysis.  Each 
recording  session  was  date-time  stamped  by  the  SensorSuite  program  for  uniqueness 
and  was  subsequently  broken  out  separately  for  analysis.  Using  the  concept  of  a 
frame  and  window  approach  present  in  activity  recognition  [32,  18,  41]  to  recognize 
the  beginning  of  a  detectable  environmental  disturbance,  the  data  prior  to  and  after 
the  entity’s  active  period  will  be  trimmed  from  the  data  set.  The  number  and  length 
of  frames  has  changed  as  the  held  of  activity  recognition  has  matured;  an  accepted 
standard  for  activity  recognition  has  settled  in  the  4-5  frames  per  over-lapping  win¬ 
dow,  with  each  frame  being  comprised  of  a  seconds  worth  of  data  samples.  Analysis 
is  then  performed  on  each  frame  to  determine  which  activity  is  occurring  based  on 
the  attributes  selected.  While  this  approach  works  well  for  activity  recognition  algo¬ 
rithms,  it  is  not  well  suited  for  entity  recognition. 

Using  the  charts  in  Figure  1  as  an  example,  we’ll  examine  why  frame  sizes  of  one 
second  are  not  advisable  for  entity  detection.  The  chart  on  the  left  depicts  a  mag¬ 
netometer  reading  for  the  undercarriage  of  a  2013  Subaru  Crosstrek  and  took  place 
across  a  5.469  second  scan  (174  sensor  plots).  The  chart  on  the  right  depicts  a  magne- 


Subaru  -  Synthesized  Magnetometer 


1000  Watt  Microwave  (50%  power))  -  Synthesized  Magnetometer 


Sensor  Packet  (Time) 


Sensor  Packet  (Time) 


2013  Subaru  Crosstrek  Undercarriage  (Left)  and  1000  Watt  Microwave  (Right) 


Figure  1.  Example  Magnetometer  Signatures 


tometer  reading  from  a  position  6”  inches  in  front  of  a  1000  watt  microwave  operating 
on  the  50%  power  setting  and  took  place  across  a  60  second  scan  (2000  sensor  plots). 


The  x-axis  represents  the  sensor  package  containing  the  magnetometer  data  and 
is  time  ordered.  Visually  it  is  evident  that  an  analysis  at  any  particular  one  second 
frame  may  not  yield  the  sensor  data  necessary  to  determine  which  entity  is  acting 
on  the  sensors.  There  are  certainly  points  in  each  data  stream  that  may  be  unique 
to  the  entity  influencing  the  sensors,  but  analyzing  the  entire  signature  to  the  the 
classifier  should  yield  a  far  more  accurate  result.  It  is  for  these  reasons  that  the 
frame  and  window  methodology  is  being  used  to  detect  the  beginning  and  end  of  a 
signature  as  opposed  to  the  typical  technique  of  capturing  a  sample  for  classification. 
The  algorithm  being  used  to  find  follows  the  construct  described  in  the  pseudo  code 
depicted  in  Algorithm  1: 
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Algorithm  1:  Window  Algorithm 
Data:  Window  of  Data 

Result:  Frame  Output 

initialization; 

Set  Window  Size,  W,  Set  Number  of  Frames,  numF,  Frame  Size 
sizeF  =  W I  numF ,  Set  n  to  Data(Length),  Set  i  to  Data(Front),  Set  t 
threshold; 

3  =  i; 

while  i  <  (n  —  W)  do 

while  j  <  numF ,  populate  Frame j  do 

Framej  =  (i*  j)  to  ((*  *  j)  +  sizeF)] 

if  if  var (Framej)  >  t  then 
I  mark  Frame  as  TRUE 

else 

L  mark  Frame  as  FALSE 

L  3 +  + 

if  two  consecutive  frames  are  TRUE  then 
set  entity  query  start  to  i] 

if  two  consecutive  frames  are  FALSE  then 

set  entity  query  stop  to  i  +  W; 

_  i  +  + 

For  the  purposes  of  identifying  a  start  and  stop  location  to  gather  data  for  sta¬ 
tistical  and  wavelet  oriented  attributes,  a  window  length  of  twelve  ( W  =  12)  with 
three  frames  was  used  (numF  =  3),  thus  each  frame  consisted  of  4  data  points.  The 
threshold  was  set  to  10%  higher  than  the  maximum  baseline  variability  reading,  as 
such  a  threshold  of  1.25  fiT  (t  =  1.25)  was  used  with  magnetometer  data  to  identify 
the  start  and  stop  of  an  entities  potential  signature.  Similarly,  applicable  variabilities 
were  used  to  identify  potential  start  and  stop  locations  the  from  accelerometer  data 
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Table  4.  Data  Session  Charts 


Trimmed  Data 

Untrimmed  Data 

3-axis  Magnetometer 

3-axis  Magnetometer 

3-axis  Accelerometer 

3-axis  Accelerometer  10 

3-axis  Gyroscope 

3-axis  Gyroscope 

Synthesized  Magnetometer 

Synthesized  Magnetometer 

Synthesized  Accelerometer 

Synthesized  Accelerometer 

Synthesized  Gyroscope 

Synthesized  Gyroscope 

stream  (5.506  g  xlO-6)  and  gyroscope  data  stream  (1.014  I  xl0~5)  as  well.  The  start 
and  stop  locations  generated  for  the  three  data  streams  were  correlated  to  verify  their 
applicability  to  the  sensor  session  the  window  algorithm  was  searching  through.  Out 
of  the  nine  entities  requiring  a  window  in  order  to  trim  off  the  non-signature  portion 
of  the  sensor  data,  seven  would  have  been  identifiable  with  the  window  algorithm  uti¬ 
lizing  just  the  magnetometer  data.  The  exception  were  the  two  low  decibel  subwoofer 
trials  at  both  40Hz  and  50Hz,  these  required  the  accelerometer  data  for  the  window 
algorithm  to  return  an  accurate  start  and  stop  location. 


The  window  algorithm  was  built  and  run  in  the  R  language  and  environment  for 
statistical  computing.  All  preprocessing  and  statistical  processing  was  performed 
in  R,  an  effort  was  made  to  limit  the  varieties  of  software  required  to  replicate  this 
project.  Once  the  start  and  stop  locations  were  identified  for  each  data  session,  charts 
were  generated  depicting  both  raw  and  synthetic  sensor  readings  for  the  magnetome¬ 
ter,  accelerometer,  and  gyroscope  for  both  the  trimmed  and  full  data  session,  thus 
each  data  session  resulted  in  12  charts  (Table  4)  being  created  and  stored  for  visual 
analysis. 


The  trimmed  data  was  then  used  to  compute  seventy-two  statistical  attributes  for 
the  classifier.  The  range,  standard  deviation  (SD),  variance,  skewness,  kurtosis,  and 
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root-mean-square  (RMS)  were  computed  for  the  magnetometer,  accelerometer,  and 
gyroscope,  resulting  in  eighteen  attributes.  Then,  a  simple  moving  average  algorithm 
(SMA)  is  applied  to  the  data  for  each  sensor’s  data  stream  to  smooth  the  data.  The 
SMA  algorithm  is  provided  samples  sizes  of  three,  five,  and  seven;  the  the  previ¬ 
ous  statistical  attributes  are  applied  to  each  of  the  sensor’s  ’smoothed’  data  streams, 
respectively.  This  results  in  fifty-four  new  attributes,  for  a  total  of  seventy-two  at¬ 
tributes.  Table  5  lists  the  attributes  computed  for  each  sensors  data. 


Table  5.  Attributes  for  the  Magnetometer,  Accelerometer,  and  Gyroscope  data 


Raw  Data 

SMA(3) 

SMA(5) 

SMA(7) 

Range 

Range 

Range 

Range 

SD 

SD 

SD 

SD 

Variance 

Variance 

Variance 

Variance 

Skewness 

Skewness 

Skewness 

Skewness 

Kurtosis 

Kurtosis 

Kurtosis 

Kurtosis 

RMS 

RMS 

RMS 

RMS 

Additionally,  the  trimmed  data  from  experiment  2,  the  vehicles  undercarriage  scan, 
was  subject  to  discrete  wavelet  transformation.  The  vehicle  undercarriage  magne¬ 
tometer  signatures  had  5  levels  of  discrete  wavelet  transforms  (DWT)  performed. 
Each  level  of  the  DWT  results  in  a  set  of  coefficients  that  represent  the  most  signif¬ 
icant  portions  of  the  signal  from  the  previous  level  of  DWT.  As  such,  each  level  of 
DWT  decomposition  will  have  less  coefficients  than  the  level  immediately  prior.  After 
the  coefficients  are  calculated  for  each  DWT  level,  the  results  of  levels  2  through  5 
will  be  ordered  with  a  set  of  the  highest  and  lowest  coefficients  serving  as  inputs  to 
the  WEKA  J4.8  decision  tree  maker  for  analysis.  Table  6  contains  the  above  details 
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Table  6.  Discrete  Wavelet  Transform  Coefficients 


DWT  Level 

#  of  High  Coefficients 

#  of  Low  Coefficients 

2 

10 

10 

3 

10 

10 

4 

10 

10 

5 

3 

3 

as  well  as  the  number  of  coefficients  being  abstracted  at  each  level  for  inclusion  in 
the  J4.8  algorithm.  DWT  level  5  has  less  coefficients  than  the  other  levels  due  to  the 
nature  of  decomposition;  there  are  less  coefficients  at  decomposition  level  5  to  utilize 
for  analysis. 


After  the  attributes  were  calculated,  the  attributes  were  output  into  an  .arff  file  for 
submission  to  the  machine  learning  workbench  WEKA  [16]  where  they  were  compiled 
and  analyzed  using  a  variety  of  attributes  to  determine  the  best  mix  for  accurate 
classification  of  the  entities. 
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IV.  Results  and  Analysis  -  Recognition  and  Identification 


4.1  Decision  Model  Review 

After  gathering  the  sensor  data  from  the  control  variables  identified  in  the  method¬ 
ology  section,  the  data  was  preprocessed,  statistically  analyzed,  and  classified  via  two 
distinct  approaches.  The  sensor  data  was  exported  from  the  iOS  SQLITE  database 
via  the  SensorSuite  program  designed  specifically  for  the  purpose  of  acquiring  data 
streams  from  the  available  sensors  and  exporting  those  sensor  streams  for  aggregation 
and  analysis. 

In  order  to  determine  the  ability  of  known  recognition  algorithms  to  accurately 
assign  an  entity  to  the  appropriate  classification,  two  approaches  are  utilized.  Both 
approaches  use  the  J4.8  WEKA  implementation  of  the  C4.5  revision  8  decision  tree 
learner  in  14  specific  attribute  configurations.  In  the  first  method,  this  approach  is 
combined  with  10-fold  cross-validation  and  a  randomly  ordered  set  of  354  instances 
to  determine  the  decision  tree’s  ability  to  accurately  classify  entities.  In  the  second 
method,  the  decision  tree  learner  is  paired  with  separate  training  and  test  data  sets. 
The  training  data  set  is  254  entities  and  the  test  set  is  100  entities. 

The  parameters  used  to  generate  the  decision  tree  model  via  the  J4.8  decision  tree 
learner  are  listed  in  Table  7. 

4.2  Decision  Model  Results 

Generating  attributes  from  the  raw  data,  there  are  six  attributes  for  each  of  the 
four  averaging  techniques  listed  in  Table  5,  thus  there  are  24  attributes  for  each  of  the 
three  sensors  yielding  a  total  of  72  attributes  for  potential  inclusion  in  the  decision 
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Table  7.  J4.8  Decision  Tree  Learner  Parameters 


Parameter 

Value 

Parameter 

Value 

binarySplits 

False 

savelnstanceData 

False 

conhdenceFactor 

0.25 

seed 

1 

debug 

False 

subtreeRaising 

True 

minNumObj 

2 

unpruned 

False 

numFolds 

3 

useLaplace 

False 

reducedErrorPruning 

False 

tree  modeled  by  the  J4.8  decision  tree  learner.  The  attribute  sets  listed  in  Table  8 
were  built  and  compared  for  their  ability  to  correctly  classify  instances. 


Table  8.  Attribute  Set  Table 


Set  # 

Mag  a 

Accel  b 

Gyro  c 

Mag  a’ d 

Accel  b>  d 

Gyro  c>  d 

1 

X 

X 

X 

2 

X 

X 

3 

X 

X 

4 

X 

5 

X 

X 

6 

X 

7 

X 

8 

X 

X 

X 

9 

X 

X 

10 

X 

X 

11 

X 

12 

X 

X 

13 

X 

14 

X 

Magnetometer  abbreviated  Mag 
Accelerometer  abbreviated  Accel 
c  Gyroscope  abbreviated  Gyro 

Set  does  not  include  SMA  attributes 


By  creating  models  based  off  the  attribute  sets  listed  in  Table  8,  results  are  gen¬ 
erated  that  will  help  identify  which  aspects  of  the  multimodal  sensor  data  stream 
are  most  useful  to  recognizing  and  classifying  the  entities  being  investigated.  The 
results  demonstrate  whether  there  is  value  added  to  fusing  data  for  interrogation. 
Additionally,  the  results  reveal  whether  smoothing  averages  are  helpful.  Though  this 
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last  point  is  harder  to  prove,  as  smoothing  helps  to  eliminate  outliers  in  a  data  stream 
and  there  may  not  be  any  outliers  in  the  354  entity  instances  tested. 

The  desired  result  of  multimodal  sensing  would  be  to  accurately  classify  all  entities 
that  a  classifier  is  capable  of  handling.  Careful  analysis  of  the  models  generated  by 
the  J48  from  the  above  attribute  sets  (Table  8)  will  reveal  that  perfect  classifica¬ 
tion  is  not  possible  given  imperfect  data  sets.  Even  those  collected  in  an  organized 
experiment  suffer  from  extraneous  data  points,  unintended  effects,  and  algorithmic 
imperfection  (such  as  those  introduced  by  trimming  the  data  with  the  windowing  al¬ 
gorithm).  More  detailed  analysis  of  the  correctly  classified  entities  and  the  incorrectly 
classified  entities  will  reveal  that  certain  sensors  are  better  at  classifying  entities  that 
exhibit  specific  combinations  of  environmental  effects. 

For  instance  the  results  listed  throughout  this  section  will  show  that  for  detecting 
microwaves,  the  magnetometer  and  the  data  stream  it  outputs  are  an  important  tool. 
When  analyzing  a  subwoofer  operating  at  different  frequencies  and  at  different  deci¬ 
bel  levels,  the  magnetometer  remains  important  for  differentiating  between  decibel 
levels,  but  the  accelerometer  and  gyroscope  and  their  ability  to  detect  rotation  and 
vibration  become  important  for  differentiating  between  frequencies. 

There  may  be  no  best  model  for  entity  detection,  therefor  it  is  important  to  under¬ 
stand  what  the  strengths  of  a  particular  model  are  as  well  as  the  effects  the  entities 
being  classified  produce.  Some  flexibility  in  model  selection  would  be  useful  in  order 
to  select  certain  sensor  combinations  depending  on  the  most  likely  entity  attempting 
to  be  classified.  As  noted  in  literature  review  section,  [28]  proposes  a  similar  system 
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with  the  JIGSAW  algorithm. 


Using  10-fold  cross  validation  with  each  of  the  attribute  sets  listed  in  Tabic  8,  the 
models  obtained  the  following  efficiencies  listed  in  Table  9. 


Table  9.  10-Fold  Cross-Validation  Attribute  Set  Results 


Set  # 

Correct 

Incorrect 

Inter-Category 

Tree  Size 

Leaves 

1 

345 

9 

4 

19 

10 

2 

346 

8 

4 

19 

10 

3 

347 

7 

3 

19 

10 

4 

345 

9 

2 

19 

10 

5 

298 

56 

4 

45 

23 

6 

295 

59 

4 

41 

21 

7 

272 

82 

8 

59 

30 

8 

344 

10 

3 

19 

10 

9 

348 

6 

0 

19 

10 

10 

346 

8 

4 

19 

10 

11 

346 

8 

0 

19 

10 

12 

303 

51 

4 

39 

20 

13 

297 

57 

4 

27 

14 

14 

266 

86 

6 

75 

38 

Tree  size  and  leaf  number  vary  greatly  between  certain  attribute  sets;  number  of  entities  remains  constant  at  10. 


From  the  cross-validation  results  summary  in  Table  9,  a  few  commonalities  may 
be  ascertained.  Overfitting  can  occur  with  attributes  that  offer  continuous  values  for 
decisions,  continuous  attributes  lend  to  the  construction  of  decision  trees  with  a  high 
number  of  branches.  With  the  smallest  tree  containing  19  nodes,  or  alternately  being 
of  size  19,  and  the  largest  tree  containing  75  nodes,  there  is  a  large  disparity  in  fit. 
Using  an  average  tree  size  of  approximately  30,  it  is  possible  to  pare  to  the  number  of 
attribute  sets  into  less  brittle  decision  trees.  As  such,  sets  1  through  4  and  8  through 
11,  and  13  are  candidates  for  further  consideration. 
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When  analyzing  the  results  of  the  candidate  attribute  sets,  Table  9  contains  the 
number  of  correctly  and  incorrectly  classified  entities.  Additionally,  the  number  of 
inter-category  classifications  if  noted.  Inter- category  refers  to  misclassifications  where 
the  classified  entity  is  placed  in  the  incorrect  category,  not  just  sub-category.  As  such, 
if  a  subwoofer  is  classified  as  a  microwave  it  is  considered  inter-category.  If  a  sub¬ 
woofer  operating  at  50Hz  is  classified  as  a  subwoofer  operating  at  40Hz  it  is  considered 
intra-category.  The  goal  of  entity  recognition  is  to  correctly  sub- categorize  an  entity 
as  accurately  as  possible,  however,  there  is  value  to  being  able  to  categorize  an  entity 
into  a  category  even  when  resolving  to  the  correct  sub-category  is  not  possible. 

The  results  of  the  well-fit  decision  trees  built  from  attribute  sets  1  through  4  all 
lend  to  the  inclusion  of  magnetometer  data  into  the  classification  model.  Note  that 
sets  1  through  4  include  the  SMA  data  for  analysis  and  should  be  considered  before 
non-SMA  attribute  sets  if  elimination  of  outlier  data  via  averaging  is  desired.  A  quick 
review  of  the  summarized  data  leads  one  to  believe  that  with  the  control  variables 
utilized  in  the  experiment  discussed  in  section  3,  an  attribute  set  based  exclusively 
on  magnetometer  data  is  sufficient  to  categorize  the  entities  correctly.  Indeed,  the 
strictly  magnetometer  set  (attribute  set  4)  posts  the  most  accurate  classification  re¬ 
sults  based  on  inter-category  classifications.  This  is  due  to  the  measurable  magnetic 
fields  generated  by  the  control  variables. 

The  magnetic  field  strength  as  measured  by  the  magnetometer  in  the  smart  phone 
allows  for  the  accurate  classification  of  the  entities  utilized  as  control  variables.  There 
is  enough  magnetic  difference  between  microwaves,  subwoofers,  and  the  baseline  envi¬ 
ronment  to  allow  for  near  perfect  categorization.  Only  2  entities  from  attribute  set  4 
are  classified  outside  their  respective  categories,  the  other  7  entities  are  misclassified 


78 


within  their  sub  categories.  The  statistical  classification  methods  are  accurate  enough 
to  determine  entity  categories  for  97.45%  of  the  entity  samples.  In  fact,  analysis  of  the 
decision  tree  generated  by  WEKA  (displayed  in  Figure  2)  indicates  that  the  model  is 
able  to  perform  accurate  classification  using  the  SD  (for  the  raw  data,  SMA  3,  and 
SMA  7)  and  the  RMS  (for  the  raw  data  and  SMA  3). 


'<«  3236617.83V  3236617.831'  '<«  1577S7.5836S  157757.5836' 


Figure  2.  Attribute  Set  4  Decision  Tree 


The  confusion  matrix  (Figure  3)  for  attribute  set  4  shows  that  two  of  the  subwoofer 
entities  are  misclassified  as  microwave  entities,  /  and  i  are  classified  as  a  and  b,  respec¬ 
tively.  The  control  variable  table  for  experiment  1  is  Table  2  and  the  entities  correlate 
directly  to  their  confusion  matrix  letter  value.  It  can  be  surmised  that  the  magnetic 
qualities  for  one  of  the  entity  samples  for  the  40Hz  subwoofer  and  one  of  the  entity 
samples  for  the  50Hz  subwoofer  are  similar  to  the  magnetic  qualities  output  by  the 
1000  watt  microwave.  This  pattern  of  misclassification  is  not  static  through  attribute 
sets  1,  2,  3,  and  4.  In  attribute  set  1,  there  are  four  category  level  misclassifications, 
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two  in  the  microwave  category  (one  each  on  the  1000  watt  microwave  ( b )  and  the  1600 
watt  microwave  (c)),  one  in  the  h  level  control  variable  entity  (50Hz  subwoofer)  and 
another  in  the  (j )  level  control  variable  entity  (baseline).  In  attribute  set  2,  there  are 
four  category  level  misclassihcations,  one  at  the  microwave  level  (c),  one  at  the  sub¬ 
woofer  category  level  (h),  and  two  at  the  ( j )  level  control  variable  entity  (baseline).  In 
attribute  set  3,  there  are  three  category  level  misclassihcations,  two  in  the  subwoofer 
category  level  (one  each  at  40Hz  (/)  and  50Hz  (z))  and  another  in  the  (j)  level  control 
variable  entity  (baseline).  A  review  of  attribute  sets  1  through  3  helps  reveal  what 
decision  tree  nodes  are  offered  compared  to  the  previously  discussed  attribute  set  4, 
and  can  offer  an  intuition  as  to  why  the  misclassihed  entities  changes  between  models. 
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Figure  3.  Attribute  Set  4  Confusion  Matrix 


Within  the  SMA  attributes  sets,  the  set  with  the  magnetometer  and  gyroscope 
(attribute  set  3)  offer  the  lowest  number  of  incorrectly  classified  entities.  The  deci¬ 
sion  tree  includes  5  magnetometer  attribute  nodes  and  4  gyroscope  attribute  nodes. 
With  7  misclassihcations,  set  3  offers  2  more  correct  classifications  than  set  4  for  an 
accuracy  rate  of  98.02%.  However,  there  is  1  additional  inter-category  misclassihca- 
tion.  While  overall  sub-level  classification  has  improved,  classification  in  the  parent 
categories  has  worsened.  Instead  of  relying  strictly  on  statistical  values  based  on  mag¬ 
netic  properties,  the  model  built  from  attribute  set  3  includes  statistical  values  based 
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on  the  gyroscope  in  addition  to  the  magnetometer  values.  This  allows  the  model 
to  more  accurately  classify  the  subwoofer  entities  via  the  subwoofer  sound  wave  and 
the  torque  experienced  by  the  smart  phone  from  the  sound  wave.  The  addition  of 
torque  nodes  in  the  decision  tree  introduces  a  misclassihed  baseline  reading  that  was 
classified  correctly  when  just  magnetic  statistical  methods  were  utilized  as  in  set  4. 
Additionally,  the  the  two  subwoofer  entities  that  were  misclassihed  in  attribute  set  4 
remain  in  the  set  3  confusion  matrix.  This  demonstrates  how  a  multimodal  approach 
offers  both  addition  resolution  possibilities  as  well  as  potential  misclassihcations  due 
to  similarity  in  statistical  decision  node  values. 

Attribute  set  2  decreases  in  accuracy  by  one  additional  misclassihcation,  as  well 
as  one  additional  inter-category  misclassihcation,  over  attribute  set  3.  The  decision 
tree  includes  5  accelerometer  attribute  nodes  and  4  magnetometer  attribute  nodes. 
The  accuracy  rate  of  set  2  is  97.74%.  This  is  still  better  than  the  9  misclassihcations 
offered  by  attribute  set  4,  but  worse  than  the  inter-category  misclassihcation  rate  of 
2  for  set  4.  The  inclusion  of  accelerometer  data  helped  eliminate  the  misclassihed 
subwoofer  entries  present  in  sets  3  and  4.  However,  the  loss  of  magnetometer  based 
decision  nodes  increase  the  baseline  misclassihcations  to  2,  as  well  as  introduce  a  mi¬ 
crowave  misclassihed  as  a  subwoofer.  Lastly,  a  new  misclassihcation  appears  in  the 
subwoofers  where  1  control  variable  (h)  is  classihed  as  a  baseline  reading.  Attribute 
set  2  continues  to  demonstrate  how  a  particular  model  could  be  tuned  to  certain  types 
of  entities,  in  this  case  those  that  induce  movement  on  a  sensor  platform. 

Attribute  set  1  contains  the  magnetometer,  accelerometer,  and  gyroscope  attributes, 
as  well  as  their  SMA  statistical  attributes.  The  number  of  misclassihcations  in  9  is 
similar  to  attribute  set  4,  but  with  4  inter-category  misclassihcations,  this  makes  at- 
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tribute  set  1  97.45%  accurate.  The  model  produced  by  the  J48  decision  tree  maker 
contains  2  accelerometer  attribute  nodes,  3  gyroscope  attribute  nodes,  and  3  mag¬ 
netometer  attribute  nodes.  The  inclusion  of  all  3  sensor  attribute  sets  results  in  the 
misclassification  of  2  microwave  control  variables,  both  as  subwoofer.  Additionally, 
the  subwoofer  from  set  2  classified  as  a  baseline  entity  is  now  classified  as  a  microwave. 
Lastly,  there  is  a  baseline  reading  classified  as  a  subwoofer,  indicating  that  the  lack  of 
magnetometer  decision  points  is  impacting  baseline  classification.  Figure  4  contains 
the  confusion  matrices  for  attribute  sets  1  through  4. 
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Attribute  Set  1  (Top-Left),  Attribute  Set  2  (Top-Right),  Attribute  Set  3  (Bottom-Left),  and  Attribute  Set  4  (Bottom- 
Right) 


Figure  4.  Attribute  Sets  1-4  Confusion  Matrices 


A  review  of  the  attribute  sets  that  do  not  include  SMA  values  includes  attribute 
sets  8  through  11  and  attribute  set  13.  The  model  generated  for  attribute  set  13 
is  42%  larger  than  the  models  for  attribute  sets  1  through  4,  and  as  such  may  be 
overfit.  Attributes  sets  1  through  7  are  the  SMA  versions  and  directly  correlate  to 
the  non-SMA  attribute  sets  8  through  14,  respectively.  As  such,  the  expectation 
that  both  attribute  sets  4  and  11,  the  strictly  magnetometer  attribute  sets,  would 
perform  similarly  is  upheld.  Attribute  set  11  performs  slightly  better  than  attribute 
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set  4,  with  1  additional  correct  classification  and  zero  inter-category  classifications. 
The  results  of  attribute  set  9  compared  to  set  2  are  likewise  similar.  Attribute  set 
9  has  2  additional  correct  classifications  and  has  zero  inter-category  classifications, 
demonstrating  that  the  pairing  of  the  accelerometer  and  magnetometer  can  produce 
highly  accurate  categorical  classification  results. 

The  decision  tree  for  attribute  set  13  is  tree  of  size  27,  which  as  noted  previously 
is  42%  larger  than  the  decision  tree’s  of  size  19  for  the  previously  discussed  attribute 
sets.  While  this  model  is  closer  to  overfit  than  the  previous  models,  it  is  not  as 
egregiously  overfit  as  the  other  non-magnetometer  based  attribute  sets.  Attribute 
set  13  is  based  strictly  on  non-SMA  accelerometer  data  and  reveals  the  possibility 
of  constructing  decision  trees  based  off  sensors  other  than  a  magnetometer  for  entity 
detection,  recognition,  and  classification.  Once  again,  this  sheds  light  on  the  need  to 
construct  a  decision  tree  making  algorithm  that  is  geared  towards  a  particular  set  of 
categories  eligible  for  detection. 

The  fact  that  the  non-SMA  attribute  sets  produced  results  very  similar  to  the  SMA 
attribute  sets  signifies  a  reliability  to  the  training  and  testing  data  that  minimizes  the 
need  to  average  data  samples  to  eliminate  noise.  A  real  world  application  of  entity 
sensing  would  include  spikes  in  various  sensor  data  that  may  not  be  indicative  of 
an  entities  existence.  As  such  the  SMA  attribute  sets,  as  seen  in  the  literature  for 
activity  recognition,  would  probably  be  more  applicable  in  a  non-experiment  driven 
entity  classification  scheme.  The  non-SMA  attribute  sets  would  include  peaks  and 
troughs  that  may  make  accurate  classification  more  difficult. 
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4.3  Training  and  Test  Set  Review 


As  a  compliment  to  the  10-fold  cross  validation  reviewed  in  Section  4.1,  a  model 
was  built  from  a  training  set  of  254  instances.  A  test  set  of  100  instances  was  run 
with  each  of  the  attribute  sets  listed  in  Table  8,  the  models  obtained  the  following 
efficiencies  listed  in  Table  10.  Comparison  between  Table  10  and  Table  9  shows  that 
with  little  exception,  the  models  created  for  the  attribute  sets  listed  in  Table  8  are 
similar  between  the  two  model  creation  methods.  This  is  to  be  expected  as  the  J4.8 
is  utilized  to  generate  both  sets  of  models,  the  only  major  difference  is  the  size  of 
the  fold,  as  the  holding  back  of  a  100  entity  test  set  is  approximately  28%  of  the  set, 
versus  10%  in  the  10-fold  cross  validation  methodology. 


Table  10.  Training  and  Test  Model  Attribute  Set  Results 


Set  # 
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Inter-Category6 

Tree  Size 
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47 

24 

14 

74 

26 

2 

51 

26 

Tree  size  and  leaf  number  vary  greatly  between  certain  attribute  sets;  number  of  entities  remains  constant  at  10. 
k  Classification  accuracy  based  on  test  set  results.  Training  set  consists  of  254  entity  instances,  test  set  consists  of 
100  entitty  instances. 
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4.4  Graph  Analysis 


Visual  analysis  of  a  data  session  from  a  random  instance  of  each  of  the  control 
variables  provides  insight  into  which  sensors  are  useful  for  detecting  a  particular 
entity.  While  a  visual  analysis  may  provide  the  insight  into  sensor  selection,  it  is 
not  a  substitute  for  statistical  analysis  due  to  subtle  changes  the  sensor  may  detect. 
The  plots  in  Figures  5,  6,  and  7  are  on  a  time  scale  (x-axis)  where  the  sensor  data 
from  each  of  the  respective  sensors  3  axises  have  been  normalized  by  synthesization 
(y-axis)  where 

A-sensor  —  \J  (Ax)'2  +  ( Ay )2  +  ( Az )2 


Thus  graphs  depict  the  orientation  independent  effects  experienced  by  the  smart 
phone  sensors. 

The  magnetometer  charts  in  Figure  5  provide  evidence  that  each  of  the  control 
variables  produce  different  magnetic  effects.  Some  of  these  effects  are  quite  apparent 
visually,  such  as  the  difference  between  the  1000  watt  microwave  operating  at  50% 
power  and  100%  power.  Others  are  less  apparent,  though  still  present,  such  as  the 
change  in  range  exhibited  between  various  subwoofer  dB  level.  For  the  subwoofer, 
the  louder  dB  levels  (the  larger  dB  values)  produce  a  more  measurable  magnetic  held. 
This  will  be  revealed  in  the  analysis  of  the  statistical  data  in  Tables  11,  12,  and  13. 

The  accelerometer  charts  in  Figure  6  provide  evidence  that  some  of  the  control 
variables  produce  seemingly  different  gravitational  effects.  Each  of  the  microwaves 
have  a  fairly  similar  accelerometer  data  range.  However,  each  of  the  subwoofer  control 
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variables  appear  different.  With  decreasing  dB  level,  the  gravitational  effects  due  to 
sound  wave  vibration  decrease.  This  remains  consistent  between  the  two  frequency 
levels.  Additionally,  sampling  rate  seems  to  play  a  role  when  comparing  the  40Hz 
and  50Hz  output,  presumably  due  to  the  nyquist  interval. 


The  gyroscope  charts  in  Figure  7  reveal  similar  environmental  aspects  to  the  ac¬ 
celerometer  charts  in  Figure  6.  An  entity  producing  torque  measurable  effects  is 


Microwave  (Top  Row)  at  1000  Watt  at  50%  Power  (Left),  at  1000  Watt  at  100%  Power  (Center),  at  1600  Watt 
at  100%  Power(Right),  40Hz  Subwoofer  (Middle  Row)  at  -13dB  (Left),  at  -25dB  (Center),  at  -30dB (Right),  50Hz 
Subwoofer  (Bottom  Row)  at  -13dB(Left),  at  -25dB (Center),  and  at  -30dB(Right) 


Figure  5.  Randomly  Selected  Control  Variable  Magnetometer  Data 


Microwave  (Top  Row)  at  1000  Watt  at  50%  Power  (Left),  at  1000  Watt  at  100%  Power  (Center),  at  1600  Watt 
at  100%  Power  (Right),  40Hz  Subwoofer  (Middle  Row)  at  -13dB  (Left),  at  -25dB  (Center),  at  -30dB  (Right),  50Hz 
Subwoofer  (Bottom  Row)  at  -13dB(Left),  at  -25dB (Center),  and  at  -30dB(Right) 


Figure  6.  Randomly  Selected  Control  Variable  Accelerometer  Data 


likely  to  also  be  producing  vibrational  effects  that  are  measurable  by  the  accelerom¬ 
eter.  Thus  it  is  no  surprise  that  the  accelerometer  and  gyroscope  charts  are  visually 
similar. 
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Microwave  (Top  Row)  at  1000  Watt  at  50%  Power  (Left),  at  1000  Watt  at  100%  Power  (Center),  at  1600  Watt 
at  100%  Power  (Right),  40Hz  Subwoofer  (Middle  Row)  at  -13dB  (Left),  at  -25dB  (Center),  at  -30dB  (Right),  50Hz 
Subwoofer  (Bottom  Row)  at  -13dB(Left),  at  -25dB (Center),  and  at  -30dB(Right) 


Figure  7.  Randomly  Selected  Control  Variable  Gyroscope  Data 


4.5  Statistical  Analysis 


Analyzing  the  mean  of  the  statistical  values  produced  for  each  of  the  control  vari¬ 
ables  categories  helps  to  understand  what  qualities  in  the  graphs  in  Figures  5,  6,  and 
7  are  useful  for  classification  purposes.  Included  are  the  range,  SD,  variance,  skew¬ 
ness,  kurtosis,  and  the  RMS.  Range  represents  the  difference  between  high  and  low 
readings  in  the  sensor  data.  SD  represents  the  standard  deviation  and  is  the  amount 
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of  variation  from  average.  Variance  is  the  measure  of  how  far  a  set  of  numbers  is 
spread  out.  Skewness  is  a  measure  of  the  asymmetry  of  the  probability  distribution. 
Kurtosis  is  the  measure  of  the  peakedness  of  a  probability  distribution,  as  such  it  is  a 
probability  distribution  shape  descriptor  like  skewness.  RMS  measures  the  magnitude 
of  the  data  stream. 


Table  11.  Magnetometer  Statistical  Values  for  Control  Variables 


Control  Variable 

Rangea 

SDa 

Variance21 

Skewnessa 

Kurtosisa 

RMSa 

a 

19.857 

2.866 

8.225 

0.109 

3.878 

176750 

b 

21.425 

3.82 

14.614 

0.085 

2.547 

176860 

c 

18.009 

3.861 

14.917 

0.184 

2.493 

157474 

d 

39.536 

11.93 

143.018 

0.061 

1.665 

2705813 

f 

11.748 

3.045 

9.289 

0.009 

1.979 

3303501 

e 

5.515 

1.121 

1.264 

-0.095 

2.789 

3302969 

g 

62.17 

19.194 

386.335 

0.121 

1.67 

2090392 

i 

15.567 

4.445 

19.919 

0.053 

1.785 

3302787 

h 

5.186 

1.051 

1.111 

0.075 

2.908 

3297486 

j 

5.484 

0.940 

0.885 

-0.013 

2.909 

1005972 

All  values  were  read  from  WEKA  Explorer’s  attribute  section  and  are  limited  to  3  significant  digits. 


The  magnetometer’s  statistical  output  (Table  11)  helps  illuminate  some  of  the  per¬ 
tinent  details  for  the  control  variables.  For  instance,  when  attempting  to  categorize 
the  1000  watt  microwave  in  100%  power  and  50%  power,  the  range  may  not  offer 
enough  statistical  difference  to  be  useful.  The  variance  on  the  other  hand  offers  a 
more  appealing  attribute  for  classification  purposes.  Given  that  the  J4.8  decision  tree 
maker  creates  nodes  for  classification  purposes,  the  classifier  may  not  always  choose 
the  best  attribute  for  classification,  but  it  will  choose  an  attribute  that  helps  classify 
an  entity.  What  this  means  is  that  even  though  the  difference  in  magnetic  variance 
seems  to  be  a  clear  choice  for  differentiating  between  a  1000  watt  microwave  operat¬ 
ing  at  2  different  power  settings,  the  classifier  utilizes  the  Standard  Deviation.  This 
doesn’t  make  the  classifier  wrong  in  any  sense,  it  just  shows  that  one’s  intuition  may 
not  be  the  same  as  the  decision  maker’s  algorithms.  T-Tests  performed  on  the  at- 


tributes  chosen  for  the  decision  tree  verify  the  statistical  validity  of  the  J4.8  decision 
tree  maker’s  choices  (Table  14). 

Within  the  magnetometer  data  there  is  a  relatively  narrow  range  that  the  statistical 
values  for  microwaves  exhibit.  The  differences  are  enough  for  accurate  classification 
and  the  decision  tree  does  well  classifying  the  microwave  correctly.  For  the  subwoofer, 
both  at  the  40Hz  and  50Hz  frequency,  there  are  significant  differences  in  the  mag¬ 
netometer  values  between  dB  levels.  This  confirms  the  effects  seen  on  the  graphs  in 
Figure  5,  as  the  magnetic  properties  being  measured  are  directly  correlated  to  the 
dB  level.  The  larger  the  dB,  the  larger  the  magnetic  field  generated  by  the  sub¬ 
woofer.  This  effect  is  detected  in  range,  SD,  and  variance  most  pronouncedly.  The 
lowest  dB  level  subwoofer  attributes  are  similar  to  those  seen  in  the  baseline  statis¬ 
tics,  indicating  that  magnetic  values  may  not  reveal  the  presence  of  a  quiet  subwoofer. 

Analyzing  the  accelerometer  statistical  values  in  Table  12  helps  to  identify  when 
accelerometer  data  may  be  useful  for  categorization.  Since  the  decision  tree  generated 
by  the  J4.8  was  of  size  27  for  our  attribute  set  of  strictly  raw  accelerometer  data,  it  is 
a  fairly  brittle  decision  tree  that  would  benefit  from  the  inclusion  of  another  sensor’s 
attributes. 

The  raw  values  from  the  accelerometer  are  a  function  of  gravity,  with  a  purely 
stationary  data  reading  being  a  value  of  1.0  for  the  force  of  gravity.  As  the  strength 
of  gravity  varies  in  many  different  ways,  proximity  to  other  objects,  elevation,  and 
latitudinal  position  on  the  earth,  a  reading  of  1.0  should  not  be  expected.  As  such 
the  apparent  variance  in  readings  experienced  by  a  near  stationary  smart  phone  that 
will  be  on  the  order  3  to  6  significant  digits  smaller  than  1.0.  Combined  with  the 


90 


Table  12.  Accelerometer  Statistical  Values  for  Control  Variables 


Control  Variable 

Rangea 

SDa 

Variancea 

Skewnessa 

Kurtosisa 

RMSa 

a 

0.014 

0.002 

0b 

-0.003 

2.986 

0.952 

b 

0.014 

0.002 

0b 

-0.009 

3.023 

0.954 

c 

0.017 

0.002 

ob 

-0.128 

4.419 

0.955 

d 

0.069 

0.021 

ob 

-0.137 

1.607 

0.939 

f 

0.038 

0.011 

ob 

-0.039 

1.728 

0.950 

e 

0.017 

0.004 

ob 

0.063 

2.402 

0.948 

g 

0.059 

0.018 

ob 

-0.170 

1.788 

0.935 

i 

0.022 

0.005 

ob 

-0.088 

2.110 

0.947 

h 

0.013 

0.003 

ob 

0.061 

2.740 

0.941 

j 

0.012 

0.002 

ob 

0.030 

2.993 

0.938 

All  values  were  read  from  WEKA  Explorer’s  attribute  section  and  are  limited  to  3  significant  digits, 
k  Value  not  0,  see  note  a. 


manner  in  which  WEKA  displays  the  attribute  values  in  the  Explorer  tab,  the  small 
differences  in  SD  and  variance  in  Table  12  are  not  evident.  What  is  evident  is  that 
the  subwoofer  produces  a  sound  wave  that  vibrates  the  smart  phone  in  a  measurable 
manner.  The  dB  level  isn’t  the  only  aspect  effecting  the  gravitational  field  read  by 
the  accelerometer,  the  frequency  also  effects  the  sensor  readings.  The  accelerometer 
range  reveals  that  a  larger  dB  causes  the  smart  phone  to  register  changes  in  gravi¬ 
tational  force.  Indeed,  with  a  baseline  reading  of  0.012  (control  variable  j),  even  the 
microwave  ovens  effect  the  environment  to  some  degree  (control  variables  a,b,  and  c). 


The  values  output  by  the  gyroscope  are  different  from  either  the  magnetometer  and 
accelerometer  output.  The  magnetometer  and  accelerometer  are  measures  of  physical 
properties  that  are  always  present  on  earth  and  are  easy  for  us  to  comprehend.  The 
gyroscope  measures  the  amount  of  rotation  being  experienced  by  the  phone  and  as 
such  is  effected  to  some  degree  by  the  rotation  of  the  earth. 


The  gyroscope’s  statistical  values  are  listed  in  Table  13  and  continue  to  illustrate 
a  few  of  the  points  highlighted  in  Tables  11  and  12.  First,  the  gyroscope  measures 
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Table  13.  Gyroscope  Statistical  Values  for  Control  Variables 


Control  Variable 

Rangea 

SDa 

Variancea 

Skewnessa 

Kurtosisa 

RMSa 

a 

0.019 

0.003 

0b 

0.095 

3.038 

0.004 

b 

0.019 

0.003 

0b 

0.083 

3.098 

0.004 

c 

0.017 

0.003 

ob 

0.064 

3.074 

0.004 

d 

0.051 

0.014 

ob 

0.336 

1.964 

0.004 

f 

0.016 

0.003 

ob 

0.037 

2.822 

0.003 

e 

0.017 

0.003 

ob 

0.107 

2.895 

0.003 

g 

0.022 

0.005 

ob 

0.140 

2.620 

0.003 

i 

0.017 

0.003 

ob 

0.106 

2.844 

0.003 

h 

0.016 

0.003 

ob 

0.215 

3.009 

0.003 

j 

0.017 

0.003 

ob 

0.110 

3.052 

0.002 

All  values  were  read  from  WEKA  Explorer’s  attribute  section  and  are  limited  to  3  significant  digits, 
k  Value  not  0,  see  note  a. 


readily  visible  differences  between  the  loudest  dB  level  for  each  frequency  and  the  two 
quieter  dB  levels.  This  shows  that  with  a  loud  enough  sound  wave,  not  only  does  the 
smart  phone  experience  gravitational  changes  related  to  vibrations,  the  phone  itself 
is  rotating  to  some  degree. 


The  subwoofer  entities  with  the  lowest  dB  level  do  not  produce  a  magnetic  field 
that  is  readily  discernible  in  the  bottom-right  of  Figure  5,  however,  the  effects  are 
pronounced  enough  that  both  the  accelerometer  and  gyroscope  display  a  feature  in 
the  bottom-right  of  their  respective  graphs  in  Figures  6  and  7.  Combined  with  the 
decision  tree  built  by  WEKA  that  utilizes  the  gyroscopes  RMS  value  to  determine 
between  baseline  and  non-baseline  entities,  there  is  a  strong  case  to  including  gyro¬ 
scope  output  in  an  entity  recognition  algorithm. 


4.6  Statistical  Analysis  of  Attribute  Set  9 

In  order  to  prove  that  the  decision  nodes  chosen  for  the  decision  tree  are  statistically 
valid,  the  nodes  in  the  most  accurate  attribute  set  were  chosen  for  analysis.  The 
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decision  tree  generated  from  attribute  set  9  resulted  in  6  misclassifications,  with 
zero  inter-category  misclassifications.  Figure  8  shows  the  decision  tree  generated  by 
WEKA. 


Figure  8.  Attribute  Sets  9  Decision  Tree 


Utilizing  the  decision  nodes  shown  in  Figure  8,  t-tests  were  performed  on  all  the 
nodes.  Table  14  shows  the  results  of  the  t-tests  performed  in  R  on  the  raw  statistical 
attributes  generated  from  the  entity  data  sessions.  With  very  small  p-values,  the 
results  indicate  the  validity  of  using  these  attributes  as  decision  nodes  in  the  decision 
tree.  As  this  analysis  was  performed  in  R,  it  is  capable  of  representing  far  more 
significant  digits  than  the  values  shown  in  Tables  12  and  13. 
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Table  14.  Attribute  Set  9  T-Test  Results 


Node  Entities 

Attribute 

t- value 

d.f.a 

p- value 

j&h 

Mag  RMS 

13.5793 

39.00 

2.30  x  10“16 

c  &  b 

Mag  RMS 

195.9860 

51.16 

2.98  x  10"75 

(h,  j)  &  e 

Accel  SD 

26.6304 

93.73 

1.68  x  10-45 

(a,  b,  c,  e,  h,  j)  &  (d,  f,  g,  i) 

Accel  SD 

19.1605 

164.87 

8.56  x  10"44 

(b,  c)  &  a 

Mag  SD 

38.8817 

72.03 

4.40  x  lO"50 

(a,  b,  c)  &  (e,  h,  j) 

Mag  SD 

48.4696 

95.69 

3.80  x  10"69 

d&g 

Mag  SD 

13.1731 

71.50 

8.38  x  10"21 

f  &  i 

Accel  Var 

53.0371 

38.30 

1.83  x  10-37 

d&g 

Mag  Range 

23.1603 

104.01 

7.54  x  10-43 

d.f.  stands  for  degrees  of  freedom 
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V.  Results  and  Analysis  -  Scanning 


5.1  Experiment  2 

After  gathering  the  sensor  data  from  the  control  variables  for  experiment  2  (Ta¬ 
ble  3)  identified  in  the  methodology  section,  the  data  was  preprocessed,  statistically 
analyzed,  transformed,  and  classified  via  two  distinct  approaches.  The  sensor  data 
was  exported  from  the  iOS  SQL1TE  database  via  the  SensorSuite  program  designed 
specifically  for  the  purpose  of  acquiring  data  streams  from  the  available  sensors  and 
exporting  those  sensor  streams  for  aggregation  and  analysis. 

5.2  Statistical  Attributes 

Each  vehicle’s  undercarriage  was  scanned  30  times  by  the  methodology  explained 
in  Section  3.2.  In  addition,  30  baseline  readings  with  no  vehicle  present  were  taken 
as  well.  Thus  there  are  90  entities  between  the  2  vehicles  and  baseline  for  control 
variables.  The  attribute  sets  utilized  in  the  statistical  analysis  are  the  same  as  those 
listed  in  Table  5. 

The  results  of  10-fold  cross-validation  on  the  vehicle’s  undercarriage  experiment 
show  that  it  is  possible  to  correctly  identify  between  the  two  vehicles  utilized  as 
control  variables.  The  best  results  with  accuracy  rates  of  96.67%  are  obtained  with 
attribute  sets  1  through  4  and  8  through  11,  which  are  the  attribute  sets  that  contain 
the  magnetometer  data.  The  decision  tree  on  the  left  in  Figure  9  is  generated  from  all 
72  possible  attributes  listed  in  Table  5.  Of  the  72  possible  attributes,  only  2  attributes 
are  utilized  by  the  decision  tree.  In  attribute  sets  that  contain  both  magnetometer 
and  gyroscope  attributes,  the  decision  tree  maker  generated  trees  that  contain  a 
gyroscope  RMS  attribute  (either  raw  or  from  SMA3)  and  the  magnetometer’s  raw 
range  attribute.  In  the  decision  tree  on  the  right  in  Figure  9  the  only  attribute  used 
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is  the  magnetometer’s  raw  range  attribute.  An  additional  decision  tree  was  made 
with  purely  SMA  based  magnetometer  data  (attribute  set  15)  to  determine  whether 
classification  would  improve  with  smoothed  data.  This  yielded  the  best  results  overall 
with  2  incorrectly  classified  instances,  using  only  the  magnetometer’s  SMA3  range 
attribute;  the  decision  tree  resulted  in  1  inter-category  misclassification. 


From  the  decision  trees  generated  for  Table  15,  the  typical  confusion  matrix  (at¬ 
tribute  sets  1  through  4  and  8  through  11)  is  shown  in  Figure  10.  The  confusion 
matrix  for  attribute  set  15  contains  1  additional  correct  classification  for  the  Ford. 


Attribute  sets  that  do  not  include  magnetometer  data  either  became  very  brittle 
decision  trees  where  the  tree  was  overfit  or  the  results  were  not  as  accurate  as  those 
including  magnetometer  attributes.  Attribute  set  5  managed  to  produce  a  satisfactory 
number  of  correct  classifications,  at  an  accuracy  rate  of  91.11%  with  SMA  based 
accelerometer  and  gyroscope  attributes.  With  a  total  of  6  leaves  and  three  control 
variables  to  classify  between,  attribute  set  5  is  probably  overfit. 

■■■  confusion  Matrix  ■■■ 

a  b  c  < —  classified  as 

28  2  0  a  ■  Ford 

0  30  0  b  ■  Subaru 

1  0  29  c  ■  Baseline 

Figure  10.  Attribute  Set  Confusion  Matrix  For  Experiment  2 
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Table  15.  Vehicle  10-Fold  Cross-Validation  Attribute  Set  Results 


Set  # 

Correct 

Incorrect 

Inter-Category 

Tree  Size 

Leaves 

1 

87 

3 

1 

5 

3 

2 

87 

3 

1 

5 

3 

3 

87 

3 

1 

5 

3 

4 

87 

3 

1 

5 

3 

5 

82 

8 

1 

7 

4 

6 

82 

8 

2 

11 

6 

7 

68 

22 

1 

11 

6 

8 

87 

3 

1 

5 

3 

9 

87 

3 

1 

5 

3 

10 

87 

3 

1 

5 

3 

11 

87 

3 

1 

5 

3 

12 

84 

6 

1 

11 

6 

13 

64 

26 

15 

19 

10 

14 

80 

10 

1 

9 

5 

15b 

88 

2 

1 

5 

3 

Tree  size  and  leaf  number  vary  greatly  between  certain  attribute  sets;  number  of  entities  remains  constant  at  10. 
SMA  based  magnetometer  attributes. 


The  results  from  the  attribute  sets  utilized  in  experiment  2  demonstrate  that  the 
magnetometer  is  the  most  valuable  sensor  for  this  type  of  entity  classification.  The 
vehicle’s  undercarriage  effects  the  magnetic  held  sensed  by  the  magnetometer  in  a 
manner  significant  enough  that  distinguishing  between  two  different  vehicles  is  pos¬ 
sibles  with  just  magnetic  attributes.  The  typical  vehicle  signatures  for  both  control 
variables  is  shown  in  Figure  ??.  When  presented  the  ability  to  build  a  decision  tree 
from  all  available  attributes,  the  J4.8  utilizes  a  gyroscope  attribute  that  distinguishes 
between  the  presence  of  a  vehicle  control  variable  and  the  baseline  readings.  This 
gyroscope  attribute  alludes  to  the  presence  of  detectable  motion  being  experienced 
by  the  smart  phone,  however,  the  lack  of  gyroscope  attributes  results  in  a  purely 
magnetometer  based  decision  tree  that  is  just  as  accurate  for  this  experiment. 
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Subaru  -  Synthesized  Magnetometer 


Magnetometer 


90- 


2013  Subaru  Crosstrek  Undercarriage  (Left)  and  2013  Ford  F-150  (Right) 

Figure  11.  Example  Vehicle  Signatures 


5.3  Wavelet  Decomposition 

Another  technique  for  classifying  time  domain  signals  is  to  utilize  wavelets.  As 
the  signatures  shown  in  Figure  ??  demonstrate  the  presence  of  peaks  and  troughs 
in  the  magnetometer  data,  wavelet  decomposition  offers  the  possibility  of  capturing 
coefficients  that  are  relevant  to  distinguishing  between  multiple  signatures.  Discussed 
in  Section  3.8,  wavelet  decomposition  was  performed  at  the  levels  noted  in  Table  6  to 
obtain  the  referenced  number  of  high  and  low  coefficients.  The  results  of  the  wavelet 
decomposition  are  noted  in  Table  16. 


Table  16.  Vehicle  10-Fold  Cross-Validation  Coefficient  Results 


Levela 

Correct 

Incorrect 

Inter-Category 

Tree  Size 

Leaves 

2 

76 

14 

2 

9 

5 

3 

81 

9 

1 

7 

4 

4 

89 

1 

1 

7 

4 

5 

73 

17 

2 

7 

4 

Decomposition  level 
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By  utilizing  DWT,  the  decision  tree  maker  was  given  the  ability  to  build  a  decision 
tree  off  of  signals  based  more  about  the  control  variable’s  magnetic  characteristic  rep¬ 
resentation  rather  than  based  purely  off  the  control  variable’s  statistical  attributes. 
Thus  the  magnetometer’s  captured  signal  with  the  distinctive  peaks  and  troughs  were 
not  lost  during  analysis  and  multiple  occurrences  of  each  were  presented  to  the  J4.8. 
The  results  demonstrate  the  ability  of  the  DWT  to  present  coefficients  to  the  decision 
tree  maker  that  result  in  highly  accurate  inter-category  classification  and  depending 
on  the  decomposition  level,  highly  accurate  overall  classification  rates. 

The  best  overall  results  between  both  the  statistical  attribute  decision  trees  and 
the  DWT  coefficient  decision  tree  are  found  at  the  fourth  level  of  decomposition.  The 
results  indicate  an  ability  to  correctly  classify  the  control  variables  98.89%  of  the 
time.  Using  magnetometer  data  from  a  data  capture  of  the  Subaru  undercarriage, 
decomposition  levels  1  through  4  can  be  seen  in  Figure  12.  The  decomposition  was 
performed  in  R  with  the  wavelets  package  utilizing  the  DWT  function  with  levels  set 
to  5,  boundary  set  to  reflection,  and  fast  set  to  false. 

The  ability  to  classify  the  vehicle  signatures  produced  by  the  control  variables  in 
experiment  2  provides  for  the  possibility  of  expanding  the  set  of  classified  vehicles 
to  a  much  larger  set.  By  utilizing  signal  decomposition  coefficients,  it  opens  up  the 
ability  to  analyze  the  specific  location  of  peaks  and  troughs  in  a  signature,  allowing 
an  algorithm  to  discriminate  between  different  vehicles.  As  construction  styles  vary, 
and  component  placement  differs  between  makes  and  models,  the  ability  to  classify  a 
larger  set  of  vehicles  requires  additional  attention. 
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Experiment  2  and  the  analyzed  results  present  the  possibility  of  using  a  smart 
phone  as  a  type  of  scanning  device.  As  scanning  for  activity  has  already  been  re¬ 
searched  and  proven  highly  accurate  in  numerous  studies,  this  ability  comes  as  no 
surprise.  The  ability  to  classify  additional  entities  via  such  techniques  presents  nu¬ 
merous  opportunities  for  future  research. 


Figure  12.  Example  Vehicle  Decomposition 
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VI.  Conclusions 


6.1  Entity  Recognition 

Analysis  of  the  experiment  output  revealed  the  ability  to  accurately  classify  entities 
via  sensor  data  gathered  by  smart  phones.  The  ability  to  accurately  classify  entities 
has  implications  across  a  number  of  disciplines.  The  fidelity  offered  by  the  diverse 
set  of  sensors  included  with  smart  phones,  as  well  as  the  current  trend  of  adding 
additional  sensors  to  smart  phones,  opens  up  an  exciting  world  of  classification  via 
smart  phone. 

6.2  Implications 

The  ability  to  recognize  entities  with  sensors  that  are  readily  available  in  smart 
phones  opens  up  a  number  of  possibilities,  far  too  many  to  list  exhaustively.  Possible 
avenues  for  entity  classification  cover  the  gamut  from  a  simple  logging  mechanism  to 
detailed  forensic  analysis  of  a  smart  phone. 

Smart  phone  users  have  access  to  apps  that  allow  for  recognition  of  a  users  activity. 
As  noted,  these  apps  allow  a  user  to  identify  not  just  the  presence  of  activity,  but 
the  form  of  activity,  type  of  transportation,  and  with  a  large  enough  feature  set,  the 
location  of  the  smart  phone  during  the  activity.  Entity  recognition  could  allow  a  user 
to  identify  microwave  oven  usage,  time  spent  at  a  computer  as  compared  to  watching 
television,  identification  of  a  particular  vehicle  being  driven,  exposure  to  overly  loud 
music,  and  many  other  scenarios. 

Taking  things  a  step  further.  It  may  be  possible  to  identify  that  a  smart  phone 
user  has  entered  a  vehicle,  and  depending  on  smart  phone  carry  location  and  mag- 


101 


netic  signature,  whether  they  are  a  driver  or  passenger.  Using  accelerometer  data  the 
motion  of  the  vehicle  could  be  analyzed  and  assessed.  If  for  instance  the  algorithm 
determined  a  vehicle  has  been  involved  in  an  accident,  it  may  be  possible  to  alert  first 
responders  to  the  potential  of  a  vehicle  in  distress. 

Analysis  of  smart  phones  involved  in  house  fires  may  reveal  that  there  are  detectable 
signals.  With  the  inclusion  of  barometers  and  thermometers  in  smart  phones,  there  is 
the  possibility  for  data  streams  that  could  help  alert  first  responders  to  the  presence 
of  an  entity  requiring  attention. 

From  a  different  perspective,  the  ability  to  analyze  entities  from  the  point  of  view 
of  the  first  responders  may  allow  for  the  near  instantaneous  dispatch  of  additional 
assets.  With  the  ability  to  analyze  accelerative  and  decelerative  patterns  from  a  point 
of  transportation,  it  should  be  possible  to  identify  when  a  traffic  officer  gives  chase. 
The  same  activity  recognition  algorithms  could  identify  when  an  officer  has  to  leave 
their  vehicle,  either  to  enter  a  new  chase  phase  or  issue  a  ticket.  If  an  officer  is  put 
into  a  situation  where  they  have  to  fire  their  sidearm,  it  may  be  possible  due  to  a 
potential  compression  in  the  air  surrounding  the  sidearm  or  via  the  microphone  to 
determine  the  sidearm  was  discharged.  This  determination  could  be  pushed  immedi¬ 
ately  to  dispatch,  allowing  for  more  rapid  response  to  situations. 

The  ability  to  analyze  a  threat  environment  and  feed  information  to  dispatchers 
is  not  limited  to  police  officers.  In  a  combat  environment,  similar  occurrences  may 
also  be  detectable  and  thus  able  to  be  fed  back  to  a  command  center  for  additional 
processing  and/or  action.  The  potential  to  help  first  responders,  crisis  responders, 
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and  combat  personnel  could  prove  helpful  to  commanders  of  all  types. 

The  ability  to  scan  an  entity  and  determine  which  classification  it  fits  into  can  also 
aid  in  threat  detection.  If  a  vehicle  is  known  to  produce  certain  effects  on  a  sensor  and 
it  is  producing  effects  that  don’t  corroborate  with  expectations,  there  may  be  a  need 
for  further  investigation.  Sometimes  the  effects  detectable  by  the  eyes  or  a  camera 
don’t  tell  the  whole  story,  a  magnetic  analysis  may  reveal  the  presence  of  anomalies. 

Flocking  observed  from  the  behavior  of  large  groups  of  people,  combined  with 
entity  detection  could  be  used  to  determine  whether  an  active-shooter  situation  is 
taking  place  or  not.  Scattering  and/or  hunkering  down  could  be  used  to  determine 
an  anomaly  is  present  in  the  function  of  how  people  behave.  Additional  input  from 
microphones  and  other  sensors  may  aid  in  locating  a  perpetrator. 

The  ability  to  collect  data  from  sensors  and  save  the  entities  interacted  with  has 
the  potential  to  analyze  smart  phones  in  a  forensic  manner.  This  could  be  done  to 
prove  timelines  and  whereabouts  of  a  smart  phone,  and  the  associated  user  presum¬ 
ably,  providing  a  signature  of  sorts  for  determining  where  and  what  a  user  was  doing. 


6.3  Further  Research 

The  work  performed  in  this  thesis  helped  determine  that  data  gathered  from  smart 
phone  sensors  was  capable  of  being  analyzed  to  accurately  recognize  the  entities  used 
as  control  variables.  Some  of  the  control  variables  were  in  categories  disparate  enough 
from  one  another  that  inter-category  misclassification  proved  unlikely.  However,  at 
the  subcategory  level,  between  subwoofer  frequencies  and/or  dB  levels,  the  smart 
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phone  sensors  were  able  to  read  environmental  variables  with  enough  fidelity  to  sub¬ 
categorize  the  control  variables  highly  accurately.  In  order  to  progress  this  research 
further,  a  number  of  issues  must  be  studied  further. 

The  first  issue  comes  from  the  limited  pool  of  entities  studied  in  this  thesis.  The 
thesis  proved  that  different  entities  of  similar  nature  can  be  accurately  identified  with 
the  correct  set  of  attributes.  Enlarging  the  pool  of  entities  with  a  standard  testing 
platform  would  allow  for  the  expansion  of  entities  recognizable  by  a  smart  phone. 
With  enough  entities,  it  may  be  possible  to  track  someone’s  day  in  not  just  terms  of 
activity,  but  also  terms  of  interaction. 

The  second  issue  is  the  testing  parameters.  There  are  any  number  of  experimen¬ 
tal  designs  that  are  possible  to  implement  when  acquiring  entity  readings  from  smart 
phone  sensors.  A  few  of  the  more  readily  apparent  are  surface  placement,  smart  phone 
mobility,  and  distances.  The  smart  phone  can  be  placed  on  any  number  of  different 
surfaces,  each  having  a  different  ability  to  vibrate  and  thus  readily  effecting  accelerom¬ 
eter  and  gyroscope  measurements.  The  smart  phone  can  be  placed  in  a  manner  that 
restricts  movement  through  some  hard  attachment  process  or  it  can  be  laid  flat  on  a 
surface  that  vibrates  freely  and  thus  may  move  the  phone.  Distances  matter  greatly 
when  detecting  the  magnetic  field  generated  by  control  variables.  These  are  just  three 
of  the  considerations  that  need  to  be  addressed  when  designing  an  experiment. 

The  third  issue  has  to  do  with  the  decision  trees.  The  attributes  were  those  iden¬ 
tified  in  the  literature  review  as  working  well  for  activity  recognition.  The  attributes 
utilized  worked  for  the  entities  chosen  as  control  variables  in  the  experiments  dis¬ 
cussed  in  this  thesis.  Other  attributes  not  included  in  the  decision  trees  discussed 
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may  be  required  to  identify  other  entities.  Additionally,  it  may  be  that  wavelet  de¬ 
composition  when  applied  to  the  control  variables  in  experiment  1  would  work  just 
as  well  as  wavelet  decomposition  did  for  experiment  2. 

The  fourth  issue  is  related  to  signatures.  Each  entity  was  captured  and  analyzed 
as  a  full  signature  after  being  trimmed  by  the  windowing  algorithm.  This  requires  an 
identifiable  sensing  of  start  and  stop  points  for  the  windowing  algorithm  in  order  to 
produce  a  signature  for  classification  purposes.  A  process  that  captures  a  sampling 
for  some  time  interval  during  an  entities  active  phase  would  prove  more  useful  than 
the  requirement  of  a  complete  entity  signature. 

The  fifth  issue  has  to  do  with  the  windowing  algorithm.  The  algorithm  senses  a 
start  and  stop  point  based  off  sensor  data.  This  works  for  some  entities,  but  not 
all.  There  are  entities  where  a  magnetic  held  may  be  experienced  long  before  the 
accelerometer  detects  changes  in  gravity  or  the  gyroscope  detects  torque  on  the  cell 
phone.  In  the  experiments  discussed  herein,  the  magnetometer  was  the  source  of  trim 
points  for  all  entities  sans  the  two  lowest  dB  level  subwoofer.  Figuring  out  how  to  tie 
the  sensors  together  into  a  coherent  windowing  algorithm  may  be  necessary  if  issue 
four  above  cannot  be  resolved. 

The  research  accomplished  in  this  thesis  proved  the  ability  to  utilize  the  sensors 
embedded  in  smart  phones  in  order  to  sense  and  classify  entities  external  to  the 
phone.  The  magnetometer,  accelerometer,  and  gyroscope  proved  able  to  sense  their 
respective  environmental  attributes  at  a  resolution  adequate  to  accurately  identify 
several  entities.  These  entities  produced  effects  that  were  both  unique  and  similar  to 
one  another,  requiring  attributes  from  multiple  sensors  in  order  to  obtain  the  most 
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accurate  results.  As  such,  a  multimodal  approach  to  sensor  fusion  was  tested  and 
validated,  paving  the  way  for  further  research. 
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